geva.hcluster: GEVA Hierarchical Clustering

Description Usage Arguments Details Value Note See Also Examples

View source: R/hclustering.R

Description

Performs a hierarchical cluster analysis from summarized data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
geva.hcluster(
  sv,
  resolution = 0.3,
  hc.method = options.hc.method,
  hc.metric = options.hc.metric,
  cl.score.method = options.cl.score.method,
  ...,
  include.raw.results = FALSE
)

options.hc.metric
# c("euclidean", "maximum", "manhattan", "canberra", 
#   "binary", "minkowski")

options.hc.method
# c("centroid", "median", "ward", "single")

Arguments

sv

a numeric SVTable object (usually GEVASummary)

resolution

numeric (0 to 1), used as a "zoom" parameter for cluster detection. A zero value returns the minimum number of clusters that can detected, while 1 returns the maximum amount of detectable clusters

hc.method

character, the agglomeration method to be used. Used as the method argument for fastcluster::hclust.vector()

hc.metric

character, the distance measure to be used. Used as the metric argument for fastcluster::hclust.vector()

cl.score.method

character, method used to calculate the cluster scores for each point. If "auto", the "hclust.height" method is selected

...

additional arguments:

  • mink.p : numeric, parameter for the Minkowsky metric. Used as the p argument for fastcluster::hclust.vector()

  • verbose : logical, whether to print the current progress (default is TRUE)

include.raw.results

logical, whether to attach intermediate results to the returned object

Details

This function performs a hierarchical cluster analysis with the aid of implemented methods from the fastcluster::fastcluster package, particularly the fastcluster::hclust.vector() function. The available methods for the hc.method and hc.metric are described in the function's documentation page (see fastcluster::hclust.vector()).

The resolution value is an accessible way to define the cluster separation threshold used in hierarchical clustering. The algorithm produces a dendrogram-like hierarchy in which each level/node is separated by a distance (sometimes called "height") to the next level/node, and the resolution translates a value between 0 and 1 to a propotional value within the total hierarchy height. This allows defining the rate of clusters from 0 to 1, which results in the least number of possible clusters (usually two) for 0, and the highest number (approximately one cluster per point) for 1.

If include.raw.results is TRUE, some aditional data will be attached to the info slot of the returned GEVACluster objects, including the kNN tree generated during the intermediate steps.

Value

A GEVACluster object

Note

In hierarchical clustering, all points are clustered. Therefore, setting resolution to 1 will result into one cluster per point, where the cluster analysis may become pointless (no pun intended).

See Also

Other geva.cluster: geva.cluster(), geva.dcluster(), geva.quantiles()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Hierarchical clustering from a randomly generated input 

# Preparing the data
ginput <- geva.ideal.example()      # Generates a random input example
gsummary <- geva.summarize(ginput)  # Summarizes with the default parameters

# Hierarchical clustering
gclust <- geva.hcluster(gsummary)
plot(gclust)

# Hierarchical clustering with slightly more resolution
gclust <- geva.hcluster(gsummary,
                       resolution=0.35)
plot(gclust)

sbcblab/geva documentation built on March 15, 2021, 10:08 p.m.