nhclu_dbscan | R Documentation |
This function performs non-hierarchical clustering based on dissimilarity using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm.
nhclu_dbscan(
dissimilarity,
index = names(dissimilarity)[3],
minPts = NULL,
eps = NULL,
plot = TRUE,
algorithm_in_output = TRUE,
...
)
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
minPts |
A |
eps |
A |
plot |
A |
algorithm_in_output |
A |
... |
Additional arguments to be passed to |
The DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
algorithm clusters points based on the density of neighbors around each
data point. It requires two main arguments: minPts
, the minimum number of
points to identify a core, and eps
, the radius used to find neighbors.
Choosing minPts: This determines how many points are necessary to form a cluster. For example, what is the minimum number of sites expected in a bioregion? Choose a value sufficiently large for your dataset and expectations.
Choosing eps: This determines how similar sites should be to form a
cluster. If eps
is too small, most points will be considered too distinct
and marked as noise. If eps
is too large, clusters may merge. The value of
eps
depends on minPts
. It is recommended to choose eps
by identifying
a knee in the k-nearest neighbor distance plot.
By default, the function attempts to find a knee in this curve
automatically, but the result is uncertain. Users should inspect the graph
and modify eps
accordingly. To explore eps
values, run the function
initially without defining eps
, review the recommendations, and adjust
as needed based on clustering results.
A list
of class bioregion.clusters
with five components:
name: A character
string containing the name of the algorithm.
args: A list
of input arguments as provided by the user.
inputs: A list
of characteristics of the clustering process.
algorithm: A list
of all objects associated with the clustering
procedure, such as original cluster objects (only if
algorithm_in_output = TRUE
).
clusters: A data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
dbscan::dbscan.
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1–30.
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_kmeans nhclu_pam nhclu_affprop
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_dbscan(dissim, index = "Simpson")
clust2 <- nhclu_dbscan(dissim, index = "Simpson", eps = 0.2)
clust3 <- nhclu_dbscan(dissim, index = "Simpson", minPts = c(5, 10, 15, 20),
eps = c(.1, .15, .2, .25, .3))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.