equivTestSorensen: Equivalence test based on the Sorensen-Dice dissimilarity

View source: R/equivtestsorensen.R

equivTestSorensenR Documentation

Equivalence test based on the Sorensen-Dice dissimilarity

Description

Equivalence test based on the Sorensen-Dice dissimilarity, computed either by an asymptotic normal approach or by a bootstrap approach.

Usage

equivTestSorensen(x, ...)

## S3 method for class 'table'
equivTestSorensen(
  x,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'matrix'
equivTestSorensen(
  x,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'numeric'
equivTestSorensen(
  x,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'character'
equivTestSorensen(
  x,
  y,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'list'
equivTestSorensen(
  x,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

## S3 method for class 'tableList'
equivTestSorensen(
  x,
  d0 = 1/(1 + 1.25),
  conf.level = 0.95,
  boot = FALSE,
  nboot = 10000,
  check.table = TRUE,
  ...
)

Arguments

x

either an object of class "table", "matrix", "numeric", "character", "list" or "tableList". See the details section for more information.

...

extra parameters for function buildEnrichTable.

d0

equivalence threshold for the Sorensen-Dice dissimilarity, d. The null hypothesis states that d >= d0, i.e., inequivalence between the compared gene lists and the alternative that d < d0, i.e., equivalence or dissimilarity irrelevance (up to a level d0).

conf.level

confidence level of the one-sided confidence interval, a value between 0 and 1.

boot

boolean. If TRUE, the confidence interval and the test p-value are computed by means of a bootstrap approach instead of the asymptotic normal approach. Defaults to FALSE.

nboot

numeric, number of initially planned bootstrap replicates. Ignored if boot == FALSE. Defaults to 10000.

check.table

Boolean. If TRUE (default), argument x is checked to adequately represent a 2x2 contingency table (or an aggregate of them) or gene lists producing a correct table. This checking is performed by means of function nice2x2Table.

y

an object of class "character" representing a list of gene identifiers.

Details

This function computes either the normal asymptotic or the bootstrap equivalence test based on the Sorensen-Dice dissimilarity, given a 2x2 arrangement of frequencies (either implemented as a "table", a "matrix" or a "numeric" object):

n11 n10
n01 n00,

The subindex '11' corresponds to those GO items enriched in both lists, '01' to items enriched in the second list but not in the first one, '10' to items enriched in the first list but not enriched in the second one and '00' corresponds to those GO items non enriched in both gene lists, i.e., to the double negatives, a value which is ignored in the computations.

In the "numeric" interface, if length(x) >= 4, the values are interpreted as (n_{11}, n_{01}, n_{10}, n_{00}), always in this order and discarding extra values if necessary.

If x is an object of class "character", then x (and y) must represent two "character" vectors of valid gene identifiers. Then the equivalence test is performed between x and y, after internally summarizing them as a 2x2 contingency table of joint enrichment. This last operation is performed by function buildEnrichTable and "valid gene identifiers" stands for the coherency of these gene identifiers with the arguments geneUniverse and orgPackg of buildEnrichTable, passed by the ellipsis argument ... in equivTestSorensen.

If x is an object of class "list", each of its elements must be a "character" vector of gene identifiers. Then all pairwise equivalence tests are performed between these gene lists.

Class "tableList" corresponds to objects representing all mutual enrichment contingency tables generated in a pairwise fashion: Given gene lists l1, l2, ..., lk, an object of class "tableList" (typically constructed by a call to function buildEnrichTable) is a list of lists of contingency tables tij generated from each pair of gene lists i and j, with the following structure:

$l2

$l2$l1$t21

$l3

$l3$l1$t31, $l3$l2$t32

...

$lk$l1$tk1, $lk$l2$tk2, ..., $lk$l(k-1)tk(k-1)

If x is an object of class "tableList", the test is performed over each one of these tables.

The test is based on the fact that the studentized statistic (^d - d) / ^se is approximately distributed as a standard normal. ^d stands for the sample Sorensen-Dice dissimilarity, d for its true (unknown) value and ^se for the estimate of its standard error. This result is asymptotically correct, but the true distribution of the studentized statistic is not exactly normal for finite samples, with a heavier left tail than expected under the Gaussian model, which may produce some type I error inflation. The bootstrap method provides a better approximation to this distribution. In the bootstrap approach, nboot new bootstrap contingency tables are generated from a multinomial distribution with parameters size = n = (n_{11} + n_{01} + n_{10} + n_{00}) and probabilities (n_{11} / n, n_{01} / n, n_{10}, n_{00} / n). Sometimes, some of these generated tables may present so low frequencies of enrichment that make them unable for Sorensen-Dice computations. As a consequence, the number of effective bootstrap samples may be lower than the number of initially planned ones, nboot, but our simulation studies concluded that this makes the test more conservative, less prone to reject a truly false null hypothesis of inequivalence, but in any case protects from inflating the type I error.

In a bootstrap test result, use getNboot to access the number of initially planned bootstrap replicates and getEffNboot to access the number of finally effective bootstrap replicates.

Value

For all interfaces (except for the "list" and "tableList" interfaces) the result is a list of class "equivSDhtest" which inherits from "htest", with the following components:

statistic

the value of the studentized statistic (dSorensen(x) - d0) / seSorensen(x)

p.value

the p-value of the test

conf.int

the one-sided confidence interval (0, dUpp]

estimate

the Sorensen dissimilarity estimate, dSorensen(x)

null.value

the value of d0

stderr

the standard error of the Sorensen dissimilarity estimate, seSorensen(x), used as denominator in the studentized statistic

alternative

a character string describing the alternative hypothesis

method

a character string describing the test

data.name

a character string giving the names of the data

enrichTab

the 2x2 contingency table of joint enrichment whereby the test was based

For the "list" and "tableList" interfaces, the result is an "equivSDhtestList", a list of objects with all pairwise comparisons, each one being an object of "equivSDhtest" class.

Methods (by class)

  • equivTestSorensen(table): S3 method for class "table"

  • equivTestSorensen(matrix): S3 method for class "matrix"

  • equivTestSorensen(numeric): S3 method for class "numeric"

  • equivTestSorensen(character): S3 method for class "character"

  • equivTestSorensen(list): S3 method for class "list"

  • equivTestSorensen(tableList): S3 method for class "tableList"

See Also

nice2x2Table for checking and reformatting data, dSorensen for computing the Sorensen-Dice dissimilarity, seSorensen for computing the standard error of the dissimilarity, duppSorensen for the upper limit of a one-sided confidence interval of the dissimilarity. getTable, getPvalue, getUpper, getSE, getNboot and getEffNboot for accessing specific fields in the result of these testing functions. update for updating the result of these testing functions with alternative equivalence limits, confidence levels or to convert a normal result in a bootstrap result or the reverse.

Examples

# Gene lists 'atlas' and 'sanger' in 'allOncoGeneLists' dataset. Table of joint enrichment
# of GO items in ontology BP at level 3.
data(tab_atlas.sanger_BP3)
tab_atlas.sanger_BP3
equivTestSorensen(tab_atlas.sanger_BP3)
# Bootstrap test:
equivTestSorensen(tab_atlas.sanger_BP3, boot = TRUE)

# Equivalence tests from scratch, directly from gene lists:
# (These examples may be considerably time consuming due to many enrichment
# tests to build the contingency tables of mutual enrichment)
# ?pbtGeneLists
# Gene universe:
# data(humanEntrezIDs)
# equivTestSorensen(pbtGeneLists[["IRITD3"]], pbtGeneLists[["IRITD5"]],
#                   geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db",
#                   onto = "CC", GOLevel = 5)
# Bootstrap instead of normal approximation test:
# equivTestSorensen(pbtGeneLists[["IRITD3"]], pbtGeneLists[["IRITD5"]],
#                   geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db",
#                   onto = "CC", GOLevel = 5,
#                   boot = TRUE)

# Essentially, the above code makes:
# IRITD3vs5.CC5 <- buildEnrichTable(pbtGeneLists[["IRITD3"]], pbtGeneLists[["IRITD5"]],
#                                   geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db",
#                                   onto = "CC", GOLevel = 5)
# IRITD3vs5.CC5
# equivTestSorensen(IRITD3vs5.CC5)
# equivTestSorensen(IRITD3vs5.CC5, boot = TRUE)
# (Note that building first the contingency table may be advantageous to save time!)

# All pairwise equivalence tests:
# equivTestSorensen(pbtGeneLists,
#                   geneUniverse = humanEntrezIDs, orgPackg = "org.Hs.eg.db",
#                   onto = "CC", GOLevel = 5)


# Equivalence test on a contingency table represented as a numeric vector:
equivTestSorensen(c(56, 1, 30, 47))
equivTestSorensen(c(56, 1, 30, 47), boot = TRUE)
equivTestSorensen(c(56, 1, 30))
# Error: all frequencies are needed for bootstrap:
try(equivTestSorensen(c(56, 1, 30), boot = TRUE), TRUE)

pablof1988/goSorensen documentation built on July 21, 2023, 8:38 a.m.