DbscanParam-class: Density-based clustering with DBSCAN
In LTLA/bluster: Clustering Algorithms for Bioconductor

DbscanParam-class

R Documentation

Density-based clustering with DBSCAN

Description

Perform density-based clustering with a fast re-implementation of the DBSCAN algorithm.

Usage

DbscanParam(
  eps = NULL,
  min.pts = 5,
  core.prop = 0.5,
  chunk.size = 1000,
  BNPARAM = KmknnParam(),
  num.threads = 1,
  BPPARAM = NULL
)

## S4 method for signature 'ANY,DbscanParam'
clusterRows(x, BLUSPARAM, full = FALSE)

Arguments

`eps`	Numeric scalar specifying the distance to use to define neighborhoods. If `NULL`, this is determined from `min.pts` and `core.prop`.
`min.pts`	Integer scalar specifying the minimum number of neighboring observations required for an observation to be a core point.
`core.prop`	Numeric scalar specifying the proportion of observations to treat as core points. This is only used when `eps=NULL`, see Details.
`chunk.size`	Integer scalar specifying the number of points to process per chunk.
`BNPARAM`	A BiocNeighborParam object specifying the algorithm to use for the neighbor searches. This should be able to support both nearest-neighbor and range queries.
`num.threads`	Integer scalar specifying the number of threads to use.
`BPPARAM`	Deprecated and ignored, use `num.threads` instead.
`x`	A numeric matrix-like object where rows represent observations and columns represent variables.
`BLUSPARAM`	A BlusterParam object specifying the algorithm to use.
`full`	Logical scalar indicating whether additional statistics should be returned.

Details

DBSCAN operates by identifying core points, i.e., observations with at least min.pts neighbors within a distance of eps. It identifies which core points are neighbors of each other, one chunk.size at a time, forming components of connected core points. All non-core points are then connected to the closest core point within eps. All groups of points that are connected in this manner are considered to be part of the same cluster. Any unconnected non-core points are treated as noise and reported as NA.

As a suitable value of eps may not be known beforehand, we can automatically determine it from the data. For all observations, we compute the distance to the kth neighbor where k is defined as round(min.pts * core.prop). We then define eps as the core.prop quantile of the distances across all observations. The default of core.prop=0.5 means that around half of the observations will be treated as core points.

Larger values of eps will generally result in fewer observations classified as noise, as they are more likely to connect to a core point. It may also promote agglomeration of existing clusters into larger entities if they are connected by regions of (relatively) low density. Conversely, larger values of min.pts will generally increase the number of noise points and may fragment larger clusters into subclusters.

To modify an existing DbscanParam object x, users can simply call x[[i]] or x[[i]] <- value where i is any argument used in the constructor.

Value

The DbscanParam constructor will return a DbscanParam object with the specified parameters.

The clusterRows method will return a factor of length equal to nrow(x) containing the cluster assignments. Note that this may contain NA values corresponding to noise points. If full=TRUE, a list is returned with clusters (the factor, as above) and objects (a list containing the eps and min.pts used in the analysis).

Author(s)

Aaron Lun

References

Ester M et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226-231.

Examples

clusterRows(iris[,1:4], DbscanParam())
clusterRows(iris[,1:4], DbscanParam(core.prop=0.8))

LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.

LTLA/bluster index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LTLA/bluster
Clustering Algorithms for Bioconductor

DbscanParam-class: Density-based clustering with DBSCAN
In LTLA/bluster: Clustering Algorithms for Bioconductor

Density-based clustering with DBSCAN

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to DbscanParam-class in LTLA/bluster...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/bluster Clustering Algorithms for Bioconductor

DbscanParam-class: Density-based clustering with DBSCAN In LTLA/bluster: Clustering Algorithms for Bioconductor

Density-based clustering with DBSCAN

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to DbscanParam-class in LTLA/bluster...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/bluster
Clustering Algorithms for Bioconductor

DbscanParam-class: Density-based clustering with DBSCAN
In LTLA/bluster: Clustering Algorithms for Bioconductor