geva.dcluster: GEVA Density Clustering

Description Usage Arguments Details Value Note See Also Examples

View source: R/dclustering.R

Description

Performs a density cluster analysis from summarized data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
geva.dcluster(
  sv,
  resolution = 0.3,
  dcluster.method = options.dcluster.method,
  cl.score.method = options.cl.score.method,
  minpts = 2,
  ...,
  eps = NA_real_,
  include.raw.results = FALSE
)

options.dcluster.method
# c("dbscan", "optics")

Arguments

sv

a numeric SVTable object (usually GEVASummary)

resolution

numeric (0 to 1), used as a "zoom" parameter for cluster detection. A zero value returns the minimum number of clusters that can detected, while 1 returns the maximum amount of detectable clusters. Ignored if eps is specified

dcluster.method

character, density-based method for cluster separation

cl.score.method

character, method used to calculate the cluster scores for each point. If "auto", the "density" method is selected

minpts

integer, minimum number of points required to form a cluster

...

additional arguments. Accepts verbose (logical, default is TRUE) to enable or disable printing the current progress

eps

numeric, maximum neighborhood distance between points to be clustered

include.raw.results

logical, whether to attach intermediate results to the returned object

Details

This function performs a density cluster analysis with the aid of implemented methods from the dbscan::dbscan package. The available methods for the dcluster.method arguments are "dbscan" and "options", which internally call dbscan::dbscan() and dbscan::optics(), respectively.

The resolution value is an accessible way to define the cluster separation threshold used in density clustering. The DBSCAN algorithm uses an epsilon value that represents the minimum distance of separation, and resolution translates a value between 0 and 1 to a propotional value within the acceptable range of epsilon values. This allows defining the rate of clusters from 0 to 1, which results in the least number of possible clusters for 0 and the highest number for 1. Nevertheless, if epsilon is specified as eps in the optinal arguments, its value is used and resolution is ignored.

The cl.score.method argument defines how scores are calculated for each SV point (row in sv) that was assigned to a cluster, (i.e., excluding non-clustered points). If specified as "auto", the parameter will be selected based on the rate of neighbor points ("density").

If include.raw.results is TRUE, some aditional data will be attached to the info slot of the returned GEVACluster objects, including the kNN tree generated during the intermediate steps.

Value

A GEVACluster object

Note

In density clustering, only the most dense points are clustered. For the unclustered points, the grouping value is set to NA.

See Also

Other geva.cluster: geva.cluster(), geva.hcluster(), geva.quantiles()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Density clustering from a randomly generated input 

# Preparing the data
ginput <- geva.ideal.example()      # Generates a random input example
gsummary <- geva.summarize(ginput)  # Summarizes with the default parameters

# Density clustering
gclust <- geva.dcluster(gsummary)
plot(gclust)

# Density clustering with slightly more resolution
gclust <- geva.dcluster(gsummary, resolution=0.35)
plot(gclust)

sbcblab/geva documentation built on March 15, 2021, 10:08 p.m.