duppSorensen: Upper limit of a one-sided confidence interval (0, dUpp] for...

View source: R/duppsorensen.R

duppSorensenR Documentation

Upper limit of a one-sided confidence interval (0, dUpp] for the Sorensen-Dice dissimilarity

Description

Upper limit of a one-sided confidence interval (0, dUpp] for the Sorensen-Dice dissimilarity

Usage

duppSorensen(x, ...)

## S3 method for class 'table'
duppSorensen(
  x,
  dis = dSorensen.table(x, check.table = FALSE),
  se = seSorensen.table(x, check.table = FALSE),
  conf.level = 0.95,
  z.conf.level = qnorm(1 - conf.level),
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'matrix'
duppSorensen(
  x,
  dis = dSorensen.matrix(x, check.table = FALSE),
  se = seSorensen.matrix(x, check.table = FALSE),
  conf.level = 0.95,
  z.conf.level = qnorm(1 - conf.level),
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'numeric'
duppSorensen(
  x,
  dis = dSorensen.numeric(x, check.table = FALSE),
  se = seSorensen.numeric(x, check.table = FALSE),
  conf.level = 0.95,
  z.conf.level = qnorm(1 - conf.level),
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'character'
duppSorensen(
  x,
  y,
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'list'
duppSorensen(
  x,
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'tableList'
duppSorensen(
  x,
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

Arguments

x

either an object of class "table", "matrix" or "numeric" representing a 2x2 contingency table, or a "character" (a set of gene identifiers) or "list" or "tableList" object. See the details section for more information.

...

additional arguments for function buildEnrichTable.

dis

Sorensen-Dice dissimilarity value. Only required to speed computations if this value is known in advance.

se

standard error estimate of the sample dissimilarity. Only required to speed computations if this value is known in advance.

conf.level

confidence level of the one-sided confidence interval, a numeric value between 0 and 1.

z.conf.level

standard normal (or bootstrap, see arguments below) distribution quantile at the 1 - conf.level value. Only required to speed computations if this value is known in advance. Then, the argument conf.level is ignored.

boot

boolean. If TRUE, z.conf.level is computed by means of a bootstrap approach instead of the asymptotic normal approach. Defaults to FALSE.

nboot

numeric, number of initially planned bootstrap replicates. Ignored if boot == FALSE. Defaults to 10000.

check.table

Boolean. If TRUE (default), argument x is checked to adequately represent a 2x2 contingency table. This checking is performed by means of function nice2x2Table.

y

an object of class "character" representing a vector of gene identifiers (e.g., ENTREZ).

Details

This function computes the upper limit of a one-sided confidence interval for the Sorensen-Dice dissimilarity, given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object):

n_{11} n_{10}
n_{01} n_{00},

The subindex '11' corresponds to those GO terms enriched in both lists, '01' to terms enriched in the second list but not in the first one, '10' to terms enriched in the first list but not enriched in the second one and '00' corresponds to those GO terms non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations, except if boot == TRUE.

In the "numeric" interface, if length(x) >= 4, the values are interpreted as (n_{11}, n_{01}, n_{10}, n_{00}), always in this order and discarding extra values if necessary.

Arguments dis, se and z.conf.level are not required. If known in advance (e.g., as a consequence of previous computations with the same data), providing its value may speed the computations.

By default, z.conf.level corresponds to the 1 - conf.level quantile of a standard normal N(0,1) distribution, as the studentized statistic (^d - d) / ^se) is asymptotically N(0,1). In the studentized statistic, d stands for the "true" Sorensen-Dice dissimilarity, ^d to its sample estimate and ^se for the estimate of its standard error. In fact, the normal is its limiting distribution but, for finite samples, the true sampling distribution may present departures from normality (mainly with some inflation in the left tail). The bootstrap method provides a better approximation to the true sampling distribution. In the bootstrap approach, nboot new bootstrap contingency tables are generated from a multinomial distribution with parameters size = n = n_{11} + n_{01} + n_{10} + n_{00} and probabilities (n_{11} / n, n_{01} / n, n_{10}, n_{00} / n). Sometimes, some of these generated tables may present so low frequencies of enrichment that make them unable for Sorensen-Dice computations. As a consequence, the number of effective bootstrap samples may be lower than the number of initially planned bootstrap samples nboot. Computing in advance the value of argument z.conf.level may be a way to cope with these departures from normality, by means of a more adequate quantile function. Alternatively, if boot == TRUE, a bootstrap quantile is internally computed.

If x is an object of class "character", then x (and y) must represent two "character" vectors of valid gene identifiers (e.g., ENTREZ). Then the confidence interval for the dissimilarity between lists x and y is computed, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function buildEnrichTable and "valid gene identifiers (e.g., ENTREZ)" stands for the coherency of these gene identifiers with the arguments geneUniverse and orgPackg of buildEnrichTable, passed by the ellipsis argument ... in dUppSorensen.

In the "list" interface, the argument must be a list of "character" vectors, each one representing a gene list (character identifiers). Then, all pairwise upper limits of the dissimilarity between these gene lists are computed.

In the "tableList" interface, the upper limits are computed over each one of these tables. Given gene lists (i.e. "character" vectors of gene identifiers) l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function buildEnrichTable) is a list of lists of contingency tables t(i,j) generated from each pair of gene lists i and j, with the following structure:

$l2

$l2$l1$t(2,1)

$l3

$l3$l1$t(3,1), $l3$l2$t(3,2)

...

$lk

$lk$l1$t(k,1), $lk$l2$t(k,2), ..., $lk$l(k-1)t(k,k-1)

Value

In the "table", "matrix", "numeric" and "character" interfaces, the value of the Upper limit of the confidence interval for the Sorensen-Dice dissimilarity. When boot == TRUE, this result also haves a an extra attribute: "eff.nboot" which corresponds to the number of effective bootstrap replicats, see the details section. In the "list" and "tableList" interfaces, the result is the symmetric matrix of all pairwise upper limits.

Methods (by class)

  • duppSorensen(table): S3 method for class "table"

  • duppSorensen(matrix): S3 method for class "matrix"

  • duppSorensen(numeric): S3 method for class "numeric"

  • duppSorensen(character): S3 method for class "character"

  • duppSorensen(list): S3 method for class "list"

  • duppSorensen(tableList): S3 method for class "tableList"

See Also

buildEnrichTable for constructing contingency tables of mutual enrichment, nice2x2Table for checking contingency tables validity, dSorensen for computing the Sorensen-Dice dissimilarity, seSorensen for computing the standard error of the dissimilarity, equivTestSorensen for an equivalence test.

Examples

# Gene lists 'atlas' and 'sanger' in 'Cangenes' dataset. Table of joint enrichment
# of GO terms in ontology BP at level 3.
data(tab_atlas.sanger_BP3)
?tab_atlas.sanger_BP3
duppSorensen(tab_atlas.sanger_BP3)
dSorensen(tab_atlas.sanger_BP3) + qnorm(0.95) * seSorensen(tab_atlas.sanger_BP3)
# Using the bootstrap approximation instead of the normal approximation to
# the sampling distribution of (^d - d) / se(^d):
duppSorensen(tab_atlas.sanger_BP3, boot = TRUE)

# Contingency table as a numeric vector:
duppSorensen(c(56, 1, 30, 47))
duppSorensen(c(56, 1, 30))

# Upper confidence limit for the Sorensen-Dice dissimilarity, from scratch,
# directly from two gene lists:
# (These examples may be considerably time consuming due to many enrichment
# tests to build the contingency tables of mutual enrichment)
# data(allOncoGeneLists)
# ?allOncoGeneLists

# Obtaining ENTREZ identifiers for the gene universe of humans:
# library(org.Hs.eg.db)
# humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID")

# Computing the Upper confidence limit:
# duppSorensen(allOncoGeneLists$atlas, allOncoGeneLists$sanger,
#              onto = "CC", GOLevel = 5,
#              geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db")
# Even more time consuming (all pairwise values):
# duppSorensen(allOncoGeneLists,
#              onto = "CC", GOLevel = 5,
#              geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db")

pablof1988/goSorensen documentation built on July 31, 2024, 10:26 p.m.