davidClustering_kappa: Cluster rows of a Kappa-statistic-matrix by the hierarhical...

View source: R/RcppExports.R

davidClustering_kappaR Documentation

Cluster rows of a Kappa-statistic-matrix by the hierarhical fuzzy multi-linkage partitioning method proposed by DAVID

Description

The function implements the Hierarhical fuzzy multi-linkage partitioning method used in the DAVID Bioinformatics tool.

Usage

davidClustering_kappa(
  kappaMatrix,
  kappaThr = 0.35,
  initialGroupMembership = 3L,
  multiLinkageThr = 0.5,
  mergeRule = 1L
)

Arguments

kappaMatrix

A numeric matrix of Kappa statistics, which is likely returned by rowKappa or colKappa

kappaThr

Numeric, the threshold of the Kappa statistic, which is used to select initial seeds. Default value: 0.35, as recommended by the authors of the original study based on their experiences.

initialGroupMembership

Non-negative integer, the number of minimal members in initial groups. Default value: 3.

multiLinkageThr

Numeric, the minimal linkage between two groups to be merged. Default value: 0.5.

mergeRule

Integer, how two seeds are merged. See below.

Currently following merge rules are implemented:

  • 1 (OR RULE) length of intersect divided by length of either seeds no less than multiLinkageThr. Empirical evidence suggests that it is a bit coarse grain than the native DAVID clustering algorithm, but the performance is quite good judged by biological relevance.

  • 2 (AND RULE) length of intersect divided by length of both seeds no less than multiLinkageThr, which gives slightly fragmented cluster by empirical experieince

  • 3 (UNION RULE) length of intersect divided by length of the union no less than multiLinkageThr, which performs similar to the AND RULE above.

  • 4 (GMEAN RULE) Geometric mean of length of intersect divided by length of both seeds no less than multiLinkageThr, the clusters tend to be highly fragemented.

  • 5 (AMEAN RULE) Arithmetic mean of length of intersect divided by length of both seeds no less than multiLinkageThr, a few items tend to appear in multiple clusters.

Note

The function has only been tested in a few anecdotal examples. Cautions and more systematic tests are required before it is applied to critical datasets.

Author(s)

Jitao David Zhang <jitao_david.zhang@roche.com>

References

Examples

synData <- matrix(c(rep(c(rep(1, 10), rep(0, 5)), 3),
rep(0, 4), rep(1, 7), rep(0,4),
rep(c(rep(0,5), rep(1,10)), 3),
rep(c(rep(0,3), 1), 4)[-16]), ncol=15, byrow=TRUE)
rownames(synData) <- sprintf("Gene %s", letters[1:8])
colnames(synData) <- sprintf("t%d", 1:15)
synKappaMat <- rowKappa(synData)
synKappaMat.round2 <- round(synKappaMat, 2)
davidClustering_kappa(synKappaMat.round2)


bedapub/ribiosMath documentation built on Jan. 29, 2023, 1:48 p.m.