geva.cluster: GEVA Cluster Analysis
In sbcblab/geva: Gene Expression Variation Analysis (GEVA)

Description Usage Arguments Details Value See Also Examples

Performs a cluster analysis from summarized data.

geva.cluster(
  sv,
  cluster.method = options.cluster.method,
  cl.score.method = options.cl.score.method,
  resolution = 0.3,
  distance.method = options.distance,
  ...,
  grouped.return = FALSE
)

options.cluster.method
# c("hierarchical", "density", "quantiles")

options.cl.score.method
# c("auto", "hclust.height", "density", "centroid")

options.distance
# c("euclidean", "manhattan")

`sv`	a `numeric` `SVTable` object (usually `GEVASummary`)
`cluster.method`	`character`, one of the main grouping methods (see ‘Details’)
`cl.score.method`	`character`, method used to calculate the cluster scores for each point. Ignored if `cluster.method` is `quantiles`
`resolution`	`numeric` (`0` to `1`), used as a "zoom" parameter for cluster detection. A zero value returns the minimum number of clusters that can detected by the `cluster.method`, while `1` returns the maximum amount of clusters. Ignored if `cluster.method` is `quantiles`
`distance.method`	`character`, two-point distance calculation method. Options are `"eucludian"` or `"manhattan"` distances
`...`	further arguments passed to `geva.dcluster()`, `geva.hcluster()`, or `geva.quantiles()`. In addition, the following arguments are accepted: `eps` : `numeric`, defines the epsilon coefficient for density clustering (see 'Details') `mink.p` : `numeric`, parameter for the Minkowsky metric used in hierarchial clustering. Used as the `p` argument for `fastcluster::hclust.vector()` `verbose` : `logical`, whether to print the current progress (default is `TRUE`)
`grouped.return`	`logical`, whether to concatenate the clustered and summarized data into a single object

The cluster.method determines which grouping subroutine is used to classify the summarized data points based on distance and partitioning. Each option has their equivalent functions that can be called directly: "density" uses geva.dcluster(); "hierarchical" uses geva.hcluster(); and "quantiles" calls geva.quantiles(). However, this wrapper function can also be used to join GEVASummary and GEVAGroupSet objects into a single GEVAGroupedSummary object by setting grouped.return to TRUE.

The cl.score.method argument defines how scores are calculated for each SV point (row in sv) that was assigned to a cluster, (i.e., excluding non-clustered points). If specified as "auto", the parameter will be selected based on the cluster.method: "density" (rate of neighbor points) for the density method; and "hclust.height" (local hierarchy height) for the hierarchical method. The "centroid" method calculates the scores based on the proportional distance between each point to its cluster's centroid. Note that the cl.score.method argument is ignored if cluster.method is "quantiles", since quantile scores are always based on their local centroid distances.

The resolution value is a more accessible way to define the cluster separation threshold used in density and hierarchical clustering methods. Density clusters uses an epsilon value that represents the minimum distance of separation, whereas hierarchical clusters are defined by cutting the hierarchy tree wherever there is a minimum distance between two hierarchies. In this sense, resolution translates a value between 0 and 1 to propotional value for epsilon or hierarchical height (depending on the cluster.method) that would result in the least number of possible clusters for 0 and the highest number for 1. Nevertheless, if epsilon is specified as eps in the optinal arguments, its value is used and resolution is ignored.

This function produces a GEVAGroupSet-derived object, particularly a GEVACluster for the "hierarchical" and "density" cluster methods or a GEVAQuantiles for the "quantiles" method.

However, if grouped.return is TRUE and sv is a GEVASummary object, the produced GEVAGroupSet data will be concatenated to the input and returned as a GEVAGroupedSummary

Other geva.cluster: geva.dcluster(), geva.hcluster(), geva.quantiles()

## Cluster analysis from a randomly generated input 

# Preparing the data
ginput <- geva.ideal.example()      # Generates a random input example
gsummary <- geva.summarize(ginput)  # Summarizes with the default parameters

# Hierarchical clustering
gclust <- geva.cluster(gsummary, cluster.method="hierarchical")
plot(gclust)

# Density clustering
gclust <- geva.cluster(gsummary, cluster.method="density")
plot(gclust)

# Density clustering with slightly more resolution
gclust <- geva.cluster(gsummary,
                       cluster.method="density",
                       resolution=0.35)
plot(gclust)