AgglomerativeClustering | R Documentation |
Recursively merges pair of clusters of sample data; uses linkage distance. This is a wrapper around the Python class sklearn.cluster.AgglomerativeClustering.
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> AgglomerativeClustering
new()
The AgglomerativeClustering class constructor.
AgglomerativeClustering$new( n_clusters = 2L, affinity = c("euclidean", "l1", "l2", "manhattan", "cosine", "precomputed"), memory = NULL, connectivity = NULL, compute_full_tree = "auto", linkage = c("ward", "complete", "average", "single"), distance_threshold = NULL, compute_distances = FALSE )
n_clusters
An integer value specifying the number of clusters to
find. It must be NULL
if distance_threshold
is not NULL
. Defaults
to 2L
.
affinity
A string specifying the metric used to compute the
linkage. Can be "euclidean"
, "l1"
, "l2"
, "manhattan"
,
"cosine"
or "precomputed"
. If linkage
is "ward"
, only
"euclidean"
is accepted. If "precomputed"
, a distance matrix
(instead of a similarity matrix) is needed as input for the $fit()
method. Defaults to "euclidean"
.
memory
A string specifying the path to the caching directory.
Defaults to NULL
in which case no caching is done.
connectivity
Either a numeric matrix or an object of class
stats::dist or an object coercible into a function by
rlang::as_function()
specifying for each sample the neighboring
samples following a given structure of the data. This can be a
connectivity matrix itself or a function that transforms the data into
a connectivity matrix. Defaults to NULL
, i.e., the hierarchical
clustering algorithm is unstructured.
compute_full_tree
Either a boolean value or the "auto"
string
specifying whether to prematurely stop the construction of the tree at
n_clusters
. This is useful to decrease computation time if the number
of clusters is not small compared to the number of samples. This option
is useful only when specifying a connectivity matrix. Note also that
when varying the number of clusters and using caching, it may be
advantageous to compute the full tree. It must be TRUE
if
distance_threshold
is not NULL
. Defaults to "auto"
, which is
equivalent to TRUE
when distance_threshold
is not NULL
or that
n_clusters
is inferior to the maximum between 100
and 0.02 * n_samples
. Otherwise, "auto"
is equivalent to FALSE
.
linkage
A string specifying which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.
ward
: minimizes the variance of the clusters being merged;
average
: uses the average of the distances of each observation of
the two sets;
complete
: uses the maximum of the distances between all
observations of the two sets.
single
: uses the minimum of the distances between all observations
of the two sets.
Defaults to "ward"
.
distance_threshold
A numeric value specifying the linkage distance
threshold above which clusters will not be merged. If not NULL
,
n_clusters
must be NULL
and compute_full_tree
must be TRUE
.
Defaults to NULL
.
compute_distances
A boolean value specifying whether to compute
distances between clusters even if distance_threshold
is not used.
This can be used to make dendrogram visualization, but introduces a
computational and memory overhead. Defaults to FALSE
.
An object of class AgglomerativeClustering.
clone()
The objects of this class are cloneable with this method.
AgglomerativeClustering$clone(deep = FALSE)
deep
Whether to make a deep clone.
cl <- AgglomerativeClustering$new()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.