pkg <- 'dbscan' source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R") pkg_title(pkg)
This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:
Clustering
Outlier Detection
Fast Nearest-Neighbor Search (using kd-trees)
The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc
), or the
implementations in WEKA, ELKI and Python's scikit-learn.
pkg_usage(pkg) pkg_citation(pkg, 2) pkg_install(pkg)
Load the package and use the numeric variables in the iris dataset
library("dbscan") data("iris") x <- as.matrix(iris[, 1:4])
DBSCAN
db <- dbscan(x, eps = .4, minPts = 4) db
Visualize the resulting clustering (noise points are shown in black).
pairs(x, col = db$cluster + 1L)
OPTICS
opt <- optics(x, eps = 1, minPts = 4) opt
Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
opt <- extractDBSCAN(opt, eps_cl = .4) plot(opt)
HDBSCAN
hdb <- hdbscan(x, minPts = 4) hdb
Visualize the hierarchical clustering as a simplified tree. HDBSCAN finds 2 stable clusters.
plot(hdb, show_flat = TRUE)
R, R package dbscan
, and Python package rpy2
need to be installed.
```{python, eval = FALSE} import pandas as pd import numpy as np
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header = None, names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']) iris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]
from rpy2.robjects import packages dbscan = packages.importr('dbscan')
from rpy2.robjects import pandas2ri pandas2ri.activate()
db = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5) print(db)
```{python, eval = FALSE} # get the cluster assignment vector labels = np.array(db.rx('cluster')) labels
## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ## 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, ## 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2, ## 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, ## 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0, ## 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, ## 2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], ## dtype=int32)
The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with permission by the original author, Erich Schubert.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.