Description Usage Arguments Details Note Author(s) References See Also Examples

Given two gene lists, tests the significance of their overlap in comparison with a genomic background. The null hypothesis is that the odds ratio is no larger than 1. The alternative is that the odds ratio is larger than 1.0. It returns the p-value, estimated odds ratio and intersection.

1 2 3 4 |

`object` |
A GeneOverlap object. |

`x` |
A GeneOverlap object. |

`...` |
They are not used. |

The problem of gene overlap testing can be described by a hypergeometric
distribution where one gene list A defines the number of white balls in the
urn and the other gene list B defines the number of white balls in the
draw. Assume the total number of genes is `n`, the number of genes in A
is `a` and the number of genes in B is `b`. If the intersection
between A and B is `t`, the probability density of seeing `t` can
be calculated as:

`dhyper(t, a, n - a, b)`

without loss of generality, we can assume `b` <= `a`. So the
largest possible value for `t` is `b`. Therefore, the p-value of
seeing intersection `t` is:

`sum(dhyper(t:b, a, n - a, b))`

The Fisher's exact test forms this problem slightly different but its calculation is also based on the hypergeometric distribution. It starts by constructing a contingency table:

```
matrix(c(n - union(A,B), setdiff(A,B),
setdiff(B,A), intersect(A,B)),
nrow=2)
```

It therefore tests the independence between A and B and is conceptually more straightforward. The GeneOverlap class is implemented using Fisher's exact test.

It is better to illustrate a concept using some example. Let's assume we have a genome of size 200 and two gene lists with 70 and 30 genes each. If the intersection between the two is 10, the hypergeometric way to calculate the p-value is:

sum(dhyper(10:30, 70, 130, 30))

which gives us p-value 0.6561562. If we use Fisher's exact test, we should do:

```
fisher.test(matrix(c(110, 20, 60, 10), nrow=2),
alternative="greater")
```

which gives exactly the same p-value. In addition, the Fisher's test function also provides an estimated odds ratio, confidence interval, etc.

The Jaccard index is a measurement of similarity between two sets. It is defined as the number of intersections over the number of unions.

Although Fisher's exact test is chosen for implementation, it should be
noted that the R implementation of Fisher's exact test is slower than using
`dhyper`

directly. As an example, run:

`system.time(sum(dhyper(10e3:30e3, 70e3, 130e3, 30e3)))`

uses around 0.016s to finish. While run:

```
system.time(fisher.test(matrix(c(110e3, 20e3, 60e3, 10e3), nrow=2),
alternative="greater"))
```

uses around 0.072s. In practice, this time difference can often be ignored.

Li Shen <li.shen@mssm.edu>

Mount Sinai profile:http://www.mountsinai.org/profiles/li-shen

Personal:http://www.linkedin.com/in/lshen/

http://en.wikipedia.org/wiki/Fisher's_exact_test

http://en.wikipedia.org/wiki/Jaccard_index

1 2 3 4 5 6 7 8 | ```
data(GeneOverlap)
go.obj <- newGeneOverlap(hESC.ChIPSeq.list$H3K4me3,
hESC.ChIPSeq.list$H3K9me3,
gs.RNASeq)
go.obj <- testGeneOverlap(go.obj)
go.obj # show.
print(go.obj) # more details.
getContbl(go.obj) # contingency table.
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.