davidClustering_kappa: Cluster rows of a Kappa-statistic-matrix by the hierarhical...
In bedapub/ribiosMath: Mathematical and Statistical Tools in Ribios

davidClustering_kappa

R Documentation

Cluster rows of a Kappa-statistic-matrix by the hierarhical fuzzy multi-linkage partitioning method proposed by DAVID

Description

The function implements the Hierarhical fuzzy multi-linkage partitioning method used in the DAVID Bioinformatics tool.

Usage

davidClustering_kappa(
  kappaMatrix,
  kappaThr = 0.35,
  initialGroupMembership = 3L,
  multiLinkageThr = 0.5,
  mergeRule = 1L
)

Arguments

`kappaMatrix`	A numeric matrix of Kappa statistics, which is likely returned by `rowKappa` or `colKappa`
`kappaThr`	Numeric, the threshold of the Kappa statistic, which is used to select initial seeds. Default value: 0.35, as recommended by the authors of the original study based on their experiences.
`initialGroupMembership`	Non-negative integer, the number of minimal members in initial groups. Default value: 3.
`multiLinkageThr`	Numeric, the minimal linkage between two groups to be merged. Default value: 0.5.
`mergeRule`	Integer, how two seeds are merged. See below. Currently following merge rules are implemented: 1 (OR RULE) length of intersect divided by length of either seeds no less than `multiLinkageThr`. Empirical evidence suggests that it is a bit coarse grain than the native DAVID clustering algorithm, but the performance is quite good judged by biological relevance. 2 (AND RULE) length of intersect divided by length of both seeds no less than `multiLinkageThr`, which gives slightly fragmented cluster by empirical experieince 3 (UNION RULE) length of intersect divided by length of the union no less than `multiLinkageThr`, which performs similar to the AND RULE above. 4 (GMEAN RULE) Geometric mean of length of intersect divided by length of both seeds no less than `multiLinkageThr`, the clusters tend to be highly fragemented. 5 (AMEAN RULE) Arithmetic mean of length of intersect divided by length of both seeds no less than `multiLinkageThr`, a few items tend to appear in multiple clusters.

Note

The function has only been tested in a few anecdotal examples. Cautions and more systematic tests are required before it is applied to critical datasets.

Author(s)

Jitao David Zhang <jitao_david.zhang@roche.com>

References

Huang *et al.* The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biology, 2007
Additional file of the manuscript available at https://david.ncifcrf.gov/helps/2D_Introduction_files/additional_file_13.doc

Examples

synData <- matrix(c(rep(c(rep(1, 10), rep(0, 5)), 3),
rep(0, 4), rep(1, 7), rep(0,4),
rep(c(rep(0,5), rep(1,10)), 3),
rep(c(rep(0,3), 1), 4)[-16]), ncol=15, byrow=TRUE)
rownames(synData) <- sprintf("Gene %s", letters[1:8])
colnames(synData) <- sprintf("t%d", 1:15)
synKappaMat <- rowKappa(synData)
synKappaMat.round2 <- round(synKappaMat, 2)
davidClustering_kappa(synKappaMat.round2)

bedapub/ribiosMath documentation built on Jan. 29, 2023, 1:48 p.m.