scclust-package: scclust: Size-Constrained Clustering

scclust-packageR Documentation

scclust: Size-Constrained Clustering

Description

The scclust package is an R wrapper for the scclust library. The package provides functions to construct near-optimal size-constrained clusterings. Subject to user-specified constraints on the size and composition of the clusters, scclust constructs a clustering so that within-cluster pair-wise distances are minimized.

Details

The main clustering function is sc_clustering. Statistics about clusters can be derived with the get_clustering_stats function. To check if a clustering satisfies some set of constraints, use check_clustering. Use scclust to construct a scclust object from an existing clustering.

Clusters can also be constructed with hierarchical_clustering. However, this function does not support type constraints and does not provide optimality guarantees. Its main use is to refine clusterings constructed with the sc_clustering function.

scclust was made with large data sets in mind, and it can cluster tens of millions of data points within minutes on an ordinary desktop computer.

See the package's website for more information: https://github.com/fsavje/scclust-R.

More information about the scclust library is found here: https://github.com/fsavje/scclust.

Bug reports and suggestions are greatly appreciated. They are best reported here: https://github.com/fsavje/scclust-R/issues.

References

Higgins, Michael J., Fredrik Sävje and Jasjeet S. Sekhon (2016), ‘Improving massive experiments with threshold blocking’, Proceedings of the National Academy of Sciences, 113:27, 7369–7376.

Sävje, Fredrik and Michael J. Higgins and Jasjeet S. Sekhon (2017), ‘Generalized Full Matching’, arXiv 1703.03882. https://arxiv.org/abs/1703.03882


scclust documentation built on Sept. 11, 2024, 6:38 p.m.