DBSCAN | R Documentation |
DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. This is a wrapper around the Python class sklearn.cluster.DBSCAN.
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> DBSCAN
new()
The DBSCAN class constructor.
DBSCAN$new( eps = 0.5, min_samples = 5L, metric = "euclidean", metric_params = NULL, algorithm = c("auto", "ball_tree", "kd_tree", "brute"), leaf_size = 30L, p = 2L, n_jobs = 1L )
eps
A numeric value specifying the maximum distance between two
samples for one to be considered as in the neighborhood of the other.
This is not a maximum bound on the distances of points within a
cluster. This is the most important DBSCAN parameter to choose
appropriately for your data set and distance function. Defaults to
0.5
.
min_samples
An integer value specifying the number of samples (or
total weight) in a neighborhood for a point to be considered as a core
point. This includes the point itself. Defaults to 5L
.
metric
Either a string or an object coercible into a function via
rlang::as_function()
specifying the metric to use when calculating
distance between instances in a feature array. If metric
is a string,
it must be one of the options allowed by
sklearn.metrics.pairwise_distances
for its metric
parameter. If metric
is "precomputed"
, X
is
assumed to be a distance matrix and must be square. X
may be a sparse
graph, in which case only nonzero elements may be considered
neighbors for DBSCAN. Defaults to "euclidean"
.
metric_params
A named list specifying additional parameters to be
passed on to the metric function. Defaults to NULL
.
algorithm
A string specifying the algorithm to be used by the
sklearn.neighbors.NearestNeighbors
module to compute pointwise distances and find nearest neighbors.
Choices are "auto"
, "ball_tree"
, "kd_tree"
or "brute"
. Defaults
to "auto"
.
leaf_size
An integer value specifying the leaf size passed to
sklearn.neighbors.BallTree
or
sklearn.neighbors.KDTree.
This can affect the speed of the construction and query, as well as the
memory required to store the tree. The optimal value depends on the
nature of the problem. Defaults to 30L
.
p
An integer value specifying the power of the Minkowski metric to
be used to calculate distance between points. Defaults to 2L
.
n_jobs
An integer value specifying the number of parallel jobs to
run. Defaults to 1L
.
An object of class DBSCAN.
clone()
The objects of this class are cloneable with this method.
DBSCAN$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ester, M., H. P. Kriegel, J. Sander, and X. Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231.
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Transactions on Database Systems (TODS), 42(3), p. 19.
cl <- DBSCAN$new()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.