DbscanParam-class | R Documentation |
Perform density-based clustering with a fast re-implementation of the DBSCAN algorithm.
DbscanParam(
eps = NULL,
min.pts = 5,
core.prop = 0.5,
chunk.size = 1000,
BNPARAM = KmknnParam(),
num.threads = 1,
BPPARAM = NULL
)
## S4 method for signature 'ANY,DbscanParam'
clusterRows(x, BLUSPARAM, full = FALSE)
eps |
Numeric scalar specifying the distance to use to define neighborhoods.
If |
min.pts |
Integer scalar specifying the minimum number of neighboring observations required for an observation to be a core point. |
core.prop |
Numeric scalar specifying the proportion of observations to treat as core points.
This is only used when |
chunk.size |
Integer scalar specifying the number of points to process per chunk. |
BNPARAM |
A BiocNeighborParam object specifying the algorithm to use for the neighbor searches. This should be able to support both nearest-neighbor and range queries. |
num.threads |
Integer scalar specifying the number of threads to use. |
BPPARAM |
Deprecated and ignored, use |
x |
A numeric matrix-like object where rows represent observations and columns represent variables. |
BLUSPARAM |
A BlusterParam object specifying the algorithm to use. |
full |
Logical scalar indicating whether additional statistics should be returned. |
DBSCAN operates by identifying core points, i.e., observations with at least min.pts
neighbors within a distance of eps
.
It identifies which core points are neighbors of each other, one chunk.size
at a time, forming components of connected core points.
All non-core points are then connected to the closest core point within eps
.
All groups of points that are connected in this manner are considered to be part of the same cluster.
Any unconnected non-core points are treated as noise and reported as NA
.
As a suitable value of eps
may not be known beforehand, we can automatically determine it from the data.
For all observations, we compute the distance to the k
th neighbor where k
is defined as round(min.pts * core.prop)
.
We then define eps
as the core.prop
quantile of the distances across all observations.
The default of core.prop=0.5
means that around half of the observations will be treated as core points.
Larger values of eps
will generally result in fewer observations classified as noise, as they are more likely to connect to a core point.
It may also promote agglomeration of existing clusters into larger entities if they are connected by regions of (relatively) low density.
Conversely, larger values of min.pts
will generally increase the number of noise points and may fragment larger clusters into subclusters.
To modify an existing DbscanParam object x
,
users can simply call x[[i]]
or x[[i]] <- value
where i
is any argument used in the constructor.
The DbscanParam
constructor will return a DbscanParam object with the specified parameters.
The clusterRows
method will return a factor of length equal to nrow(x)
containing the cluster assignments.
Note that this may contain NA
values corresponding to noise points.
If full=TRUE
, a list is returned with clusters
(the factor, as above) and objects
(a list containing the eps
and min.pts
used in the analysis).
Aaron Lun
Ester M et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226-231.
clusterRows(iris[,1:4], DbscanParam())
clusterRows(iris[,1:4], DbscanParam(core.prop=0.8))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.