The optimal discovery procedure

Share:

Description

odp performs the optimal discovery procedure, which is a framework for optimally performing many hypothesis tests in a high-dimensional study. When testing whether a feature is significant, the optimal discovery procedure uses information across all features when testing for significance.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
odp(object, de.fit, odp.parms = NULL, weights = NULL, bs.its = 100,
  n.mods = 50, seed = NULL, verbose = TRUE, ...)

## S4 method for signature 'deSet,missing'
odp(object, de.fit, odp.parms = NULL,
  weights = NULL, bs.its = 100, n.mods = 50, seed = NULL,
  verbose = TRUE, ...)

## S4 method for signature 'deSet,deFit'
odp(object, de.fit, odp.parms = NULL,
  weights = NULL, bs.its = 100, n.mods = 50, seed = NULL,
  verbose = TRUE, ...)

Arguments

object

S4 object: deSet

de.fit

S4 object: deFit. Optional.

odp.parms

list: parameters for each cluster. See kl_clust.

weights

matrix: weights for each observation. Default is NULL.

bs.its

numeric: number of null bootstrap iterations. Default is 100.

n.mods

integer: number of clusters used in kl_clust. Default is 50.

seed

integer: set the seed value. Default is NULL.

verbose

boolean: print iterations for bootstrap method. Default is TRUE.

...

Additional arguments for qvalue and empPvals.

Details

The full ODP estimator computationally grows quadratically with respect to the number of genes. This becomes computationally taxing at a certain point. Therefore, an alternative method called mODP is used which has been shown to provide results that are very similar. mODP utilizes a clustering algorithm where genes are assigned to a cluster based on the Kullback-Leiber distance. Each gene is assigned an module-average parameter to calculate the ODP score and it reduces the computations time to approximately linear (see Woo, Leek and Storey 2010). If the number of clusters is equal to the number of genes then the original ODP is implemented. Depending on the number of hypothesis tests, this can take some time.

Value

deSet object

Author(s)

John Storey, Jeffrey Leek, Andrew Bass

References

Storey JD. (2007) The optimal discovery procedure: A new approach to simultaneous significance testing. Journal of the Royal Statistical Society, Series B, 69: 347-368.

Storey JD, Dai JY, and Leek JT. (2007) The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics, 8: 414-432.

Woo S, Leek JT, Storey JD (2010) A computationally efficient modular optimal discovery procedure. Bioinformatics, 27(4): 509-515.

See Also

kl_clust, build_models and deSet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# import data
library(splines)
data(kidney)
age <- kidney$age
sex <- kidney$sex
kidexpr <- kidney$kidexpr
cov <- data.frame(sex = sex, age = age)

# create models
null_model <- ~sex
full_model <- ~sex + ns(age, df = 4)

# create deSet object from data
de_obj <- build_models(data = kidexpr, cov = cov,
null.model = null_model, full.model = full_model)

# odp method
de_odp <- odp(de_obj, bs.its = 30)

# input a deFit object or ODP parameters ... not necessary
de_fit <- fit_models(de_obj, stat.type = "odp")
de_clust <- kl_clust(de_obj, n.mods = 10)
de_odp <- odp(de_obj, de.fit = de_fit, odp.parms = de_clust,
bs.its = 30)

# summarize object
summary(de_odp)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.