This R package provides a novel cluster extraction method for the OPTICS algorithm, OPTICS k-Xi, along with ggplot2 visualizations and a framework to compare clustering models with varying parameters using distance-based metrics.
Density-based clustering methods are well adapted to the clustering of high-dimensional data and enable the discovery of core groups of various shapes despite large amounts of noise.
The opticskxi R package provides a novel density-based cluster extraction method, OPTICS k-Xi, and a framework to compare k-Xi models using distance-based metrics to investigate datasets with unknown number of clusters. The vignette first introduces density-based algorithms with simulated datasets, then presents and evaluates the k-Xi cluster extraction method. Finally, the models comparison framework is described and experimented on 2 genetic datasets to identify groups and their discriminating features.
The k-Xi algorithm is a novel OPTICS cluster extraction method that specifies directly the number of clusters and does not require fine-tuning of the steepness parameter as the OPTICS Xi method. Combined with a framework that compares models with varying parameters, the OPTICS k-Xi method can identify groups in noisy datasets with unknown number of clusters.
Using the devtools package in R:
Compute OPTICS profile and k-Xi clustering
data('multishapes') optics_shapes <- dbscan::optics(multishapes[1:2]) kxi_shapes <- opticskxi(optics_shapes, n_xi = 5, pts = 30)
Visualize with ggplot2
Compare multiple k-Xi models in dataset with unknown number of clusters and visualize the best models:
data('hla') m_hla <- hla[-c(1:2)] %>% scale df_params_hla <- expand.grid(n_xi = 3:5, pts = c(20, 30, 40), dist = c('manhattan', 'euclidean', 'abscorrelation', 'abspearson')) df_kxi_hla <- opticskxi_pipeline(m_hla, df_params_hla)
ggplot_kxi_metrics(df_kxi_hla, n = 8) gtable_kxi_profiles(df_kxi_hla) %>% plot
best_kxi_hla <- get_best_kxi(df_kxi_hla, rank = 2) clusters_hla <- best_kxi_hla$clusters fortify_pca(m_hla, sup_vars = data.frame(Clusters = clusters_hla)) %>% ggpairs('Clusters', ellipses = TRUE, variables = TRUE)
See the vignette for results and further details.
This work was inspired by Jérôme Wojcik (Precision for Medicine) and Sviatoslav Voloshynovskiy (University of Geneva).
This package is free and open source software, licensed under GPL-3.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.