dSorensen: Computation of the Sorensen-Dice dissimilarity

View source: R/dsorensen.R

dSorensenR Documentation

Computation of the Sorensen-Dice dissimilarity

Description

Computation of the Sorensen-Dice dissimilarity

Usage

dSorensen(x, ...)

## S3 method for class 'table'
dSorensen(x, check.table = TRUE, ...)

## S3 method for class 'matrix'
dSorensen(x, check.table = TRUE, ...)

## S3 method for class 'numeric'
dSorensen(x, check.table = TRUE, ...)

## S3 method for class 'character'
dSorensen(x, y, check.table = TRUE, ...)

## S3 method for class 'list'
dSorensen(x, check.table = TRUE, ...)

## S3 method for class 'tableList'
dSorensen(x, check.table = TRUE, ...)

Arguments

x

either an object of class "table", "matrix" or "numeric" representing a 2x2 contingency table, or a "character" vector (a set of gene identifiers) or "list" or "tableList" object. See the details section for more information.

...

extra parameters for function buildEnrichTable.

check.table

Boolean. If TRUE (default), argument x is checked to adequately represent a 2x2 contingency table, by means of function nice2x2Table.

y

an object of class "character" representing a vector of valid gene identifiers (e.g., ENTREZ).

Details

Given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object):

n_{11} n_{10}
n_{01} n_{00},

this function computes the Sorensen-Dice dissimilarity

\frac{n_{10} + n_{01}}{2 n_{11} + n_{10} + n_{01}}.

The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations.

In the "numeric" interface, if length(x) >= 3, the values are interpreted as (n_{11}, n_{01}, n_{10}, n_{00}), always in this order and discarding extra values if necessary. The result is correct, regardless the frequencies being absolute or relative.

If x is an object of class "character", then x (and y) must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the dissimilarity between lists x and y is computed, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function buildEnrichTable and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments geneUniverse and orgPackg of buildEnrichTable, passed by the ellipsis argument ... in dSorensen.

If x is an object of class "list", the argument must be a list of "character" vectors, each one representing a gene list (character identifiers). Then, all pairwise dissimilarities between these gene lists are computed.

If x is an object of class "tableList", the Sorensen-Dice dissimilarity is computed over each one of these tables. Given k gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function buildEnrichTable) is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure:

$l2

$l2$l1$t(2,1)

$l3

$l3$l1$t(3,1), $l3$l2$t(3,2)

...

$lk

$lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(k,k-1)

Value

In the "table", "matrix", "numeric" and "character" interfaces, the value of the Sorensen-Dice dissimilarity. In the "list" and "tableList" interfaces, the symmetric matrix of all pairwise Sorensen-Dice dissimilarities.

Methods (by class)

  • dSorensen(table): S3 method for class "table"

  • dSorensen(matrix): S3 method for class "matrix"

  • dSorensen(numeric): S3 method for class "numeric"

  • dSorensen(character): S3 method for class "character"

  • dSorensen(list): S3 method for class "list"

  • dSorensen(tableList): S3 method for class "tableList"

See Also

buildEnrichTable for constructing contingency tables of mutual enrichment, nice2x2Table for checking contingency tables validity, seSorensen for computing the standard error of the dissimilarity, duppSorensen for the upper limit of a one-sided confidence interval of the dissimilarity, equivTestSorensen for an equivalence test.

Examples

# Gene lists 'atlas' and 'sanger' in 'allOncoGeneLists' dataset. Table of joint enrichment
# of GO terms in ontology BP at level 3.
data(cont_atlas.sanger_BP4)
cont_atlas.sanger_BP4
?cont_atlas.sanger_BP4
dSorensen(cont_atlas.sanger_BP4)

# Table represented as a vector:
conti4 <- c(56, 1, 30, 471)
dSorensen(conti4)
# or as a plain matrix:
dSorensen(matrix(conti4, nrow = 2))

# This function is also appropriate for proportions:
dSorensen(conti4 / sum(conti4))

conti3 <- c(56, 1, 30)
dSorensen(conti3)

# Sorensen-Dice dissimilarity from scratch, directly from two gene lists:
# (These examples may be considerably time consuming due to many enrichment
# tests to build the contingency tables of joint enrichment)
# data(allOncoGeneLists)
# ?allOncoGeneLists

# Obtaining ENTREZ identifiers for the gene universe of humans:
# library(org.Hs.eg.db)
# humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID")

# (Time consuming, building the table requires many enrichment tests:)
# dSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger,
#           onto = "BP", GOLevel = 3,
#           geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db")

# Essentially, the above code makes the same as:
# cont_atlas.sanger_BP4 <- buildEnrichTable(allOncoGeneLists$atlas, allOncoGeneLists$sanger,
#                                     onto = "BP", GOLevel = 4,
#                                     geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db")
# dSorensen(cont_atlas.sanger_BP4)
# (Quite time consuming, all pairwise dissimilarities:)
# dSorensen(allOncoGeneLists,
#           onto = "BP", GOLevel = 4,
#           geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db")

pablof1988/goSorensen documentation built on Dec. 15, 2024, 12:01 p.m.