FeatureAgglomeration | R Documentation |
Recursively merges pair of clusters of features. This is a wrapper around the Python class sklearn.cluster.FeatureAgglomeration.
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> FeatureAgglomeration
new()
The FeatureAgglomeration class constructor.
FeatureAgglomeration$new( n_clusters = 2L, affinity = c("euclidean", "l1", "l2", "manhattan", "cosine", "precomputed"), memory = NULL, connectivity = NULL, compute_full_tree = "auto", linkage = c("ward", "complete", "average", "single"), pooling_func = rowMeans, distance_threshold = NULL, compute_distances = FALSE )
n_clusters
An integer value specifying the number of clusters to
find. Defaults to 2L
.
affinity
A string or an object coercible into a function via
rlang::as_function()
specifying the metric used to compute the
linkage. If a string, choices are "euclidean"
, "l1"
, "l2"
,
"manhattan"
, "cosine"
or "precomputed"
. If linkage is "ward"
,
only "euclidean"
is accepted. Defaults to "euclidean"
.
memory
A string specifying path to the caching directory for
storing the computation of the tree. Defaults to NULL
in which case
no caching is done.
connectivity
A numeric matrix or an object coercible into a
function via rlang::as_function()
specifying the connectivity matrix.
Defines for each feature the neighboring features following a given
structure of the data. This can be a connectivity matrix itself or a
function that transforms the data into a connectivity matrix, such as
derived from
sklearn.neighbors.kneighbors_graph().
Defaults to NULL
in which case the hierarchical clustering algorithm
is unstructured.
compute_full_tree
The string "auto"
or a boolean value
specifying whether to stop early the construction of the tree at
n_clusters
. This is useful to decrease computation time if the number
of clusters is not small compared to the number of features. This
option is useful only when specifying a connectivity matrix. Note also
that when varying the number of clusters and using caching, it may be
advantageous to compute the full tree. It must be TRUE
if
distance_threshold
is not NULL
. Defaults to "auto"
, which is
equivalent to TRUE
when distance_threshold
is not NULL
or when
n_clusters
is inferior to max(100, 0.02 * n_samples)
and to FALSE
otherwise.
linkage
A string specifying which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion:
"ward"
: minimizes the variance of the clusters being merged;
"complete"
: maximum linkage uses the maximum distances between all
features of the two sets;
"average"
: uses the average of the distances of each feature of the
two sets;
"single"
: uses the minimum of the distances between all features of
the two sets.
pooling_func
An object coercible into a function via
rlang::as_function()
specifying the aggregation method to combine the
values of agglomerated features into a single value. It should take as
input an array of shape M \times N
and the optional argument
axis = 1
, and reduce it to an array of shape M
. Defaults to
base::rowMeans.
distance_threshold
A numeric value specifying the linkage distance
threshold above which clusters will not be merged. If not NULL
,
n_clusters
must be NULL
and compute_full_tree
must be TRUE
.
Defaults to NULL
.
compute_distances
A boolean value specifying whether to compute
distances between clusters even if distance_threshold
is not used.
This can be used to make dendrogram visualization, but introduces a
computational and memory overhead. Defaults to FALSE
.
An object of class FeatureAgglomeration.
clone()
The objects of this class are cloneable with this method.
FeatureAgglomeration$clone(deep = FALSE)
deep
Whether to make a deep clone.
cl <- FeatureAgglomeration$new()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.