| nhclu_dbscan | R Documentation |
This function performs non-hierarchical clustering based on dissimilarity using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm.
nhclu_dbscan(
dissimilarity,
index = names(dissimilarity)[3],
minPts = NULL,
eps = NULL,
plot = TRUE,
algorithm_in_output = TRUE,
...
)
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
minPts |
A |
eps |
A |
plot |
A |
algorithm_in_output |
A |
... |
Additional arguments to be passed to |
The DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
algorithm clusters points based on the density of neighbors around each
data point. It requires two main arguments: minPts, the minimum number of
points to identify a core, and eps, the radius used to find neighbors.
Choosing minPts: This determines how many points are necessary to form a cluster. For example, what is the minimum number of sites expected in a bioregion? Choose a value sufficiently large for your dataset and expectations.
Choosing eps: This determines how similar sites should be to form a
cluster. If eps is too small, most points will be considered too distinct
and marked as noise. If eps is too large, clusters may merge. The value of
eps depends on minPts. It is recommended to choose eps by identifying
a knee in the k-nearest neighbor distance plot.
By default, the function attempts to find a knee in this curve
automatically, but the result is uncertain. Users should inspect the graph
and modify eps accordingly. To explore eps values, run the function
initially without defining eps, review the recommendations, and adjust
as needed based on clustering results.
A list of class bioregion.clusters with five components:
name: A character string containing the name of the algorithm.
args: A list of input arguments as provided by the user.
inputs: A list of characteristics of the clustering process.
algorithm: A list of all objects associated with the clustering
procedure, such as original cluster objects (only if
algorithm_in_output = TRUE).
clusters: A data.frame containing the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
dbscan::dbscan.
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1–30.
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_kmeans nhclu_pam nhclu_affprop
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_dbscan(dissim, index = "Simpson")
clust2 <- nhclu_dbscan(dissim, index = "Simpson", eps = 0.2)
clust3 <- nhclu_dbscan(dissim, index = "Simpson", minPts = c(5, 10, 15, 20),
eps = c(.1, .15, .2, .25, .3))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.