mhg_test: Test for enrichment in a ranked binary list.

Description Usage Arguments Details Value References See Also Examples

View source: R/RcppExports.R

Description

Given a ranked binary list of ones and zeros, test if the ones are enriched at the beginning of the list.

Usage

1
mhg_test(x, N, K, L, X, upper_bound = FALSE, tol = 1e-16)

Arguments

x

Binary vector of ones and zeros.

N

Size of the population.

K

Number of successes in the population.

L

Only consider scores for the first L observations.

X

Require at least X ones to get a score less than 1.

upper_bound

Instead of running a dynamic programming algorithm, return the upper bound for the p-value.

tol

The tolerance for testing equality of two numbers.

Details

Suppose we have a set of N = 5000 genes and K = 100 of them are annotated with a Gene Ontology (GO) term. Further, suppose that we find some subset of these genes to be significantly differentially expressed (DE) between two conditions. Within the DE genes, we notice that k = 15 of the DE genes are annotated with the Gene Ontology term. At this point, we would like to know if the GO term is enriched for DE genes.

We use the hypergeometric distribution to compute a probability that we would observe a given number of DE genes annotated with a GO term. You can find more details in the documentation for dhyper.

The method consists of three steps:

Value

A list with items "threshold", "mHG", and "pvalue".

References

Eden, E., Lipson, D., Yogev, S. & Yakhini, Z. Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3, e39 (2007). http://dx.doi.org/10.1371/journal.pcbi.0030039

Wagner, F. GO-PCA: An Unsupervised Method to Explore Biological Heterogeneity Based on Gene Expression and Prior Knowledge. bioRxiv (2015). http://dx.doi.org/10.1101/018705

See Also

plot_mhg

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Size of the population.
N <- 5000L
# Successes in the population.
K <- 100L
# Only consider enrichments in the first L observations.
L <- N / 4L
# Require at least X successes in the first L observations.
X <- 5L

set.seed(42)

# Binary vector of successes and failures.
x <- rep(0, N)
x[sample(100, 5)] <- 1
x[sample(200, 10)] <- 1

res <- mhg_test(x, N, K, L, X)

abs(res$pvalue - 1.810658e-05) < 1e-6 # TRUE

# Plot the result.
plot_mhg(sort(rnorm(N)), x, res, L)

slowkow/mhg documentation built on May 30, 2019, 3:06 a.m.