clustering | R Documentation |
This function cluster the genomes using mash data, accnet data
or igraph data. The object produced by accnet function,
mash function and/or knnn data could be clustered.
accnet objects are clustered using jaccard distance from
presence/absence gene/proteins data. mash object uses the mash distances
value as similarity. igraph objects could be clustered using the methods
availables in igraph
clustering(data, method, n_clust, d_reduction = FALSE)
data |
An object of class accnet/mash/igraph |
method |
Method of clustering
|
n_clust |
Number of cluster (only for Hierarchical methods) |
d_reduction |
boolean Perform a dimensional reduction (umap) previous to clustering procces. |
A membership data.frame with the columns "Source" and "Cluster"
Clustering of igraph objects depends of the network building
(see knnn function) and the number of cluster may variate between
different setting of the k-nn network. Network based-methods are faster than distance
based methods.
Dimensional reduction tries to overcome "the curse of
dimensionality" (more variables than samples:
https://en.wikipedia.org/wiki/Curse_of_dimensionality). Using
umap from uwot
package we reduce to two the dimensionality of
the dataset. Note that methods based on HDBSCAN allways perform the
dimensional reduction.
There is not a universall criteria to select the number of clusters and the best
configuration for one dataset may be not be the best one for others.
If you desire to know more about clustering we recommend the book "Practical Guide To
Cluster Analysis in R" from Alboukadel Kassambara (STHDA ed.)
For more information: knnn
, accnet
, mash
, igraph
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.