RcppML
is an R package for fast non-negative matrix factorization and divisive clustering using large sparse matrices. For the single-cell analysis version of functionality in RcppML, check out zdebruine/singlet.
Check out the RcppML pkgdown
site!
RcppML NMF is: * The fastest NMF implementation in any language for sparse and dense matrices * More interpretable than other implementations due to diagonal scaling * Easy to regularize with an L1 penalty
Install from CRAN or the development version from GitHub:
install.packages('RcppML') # install CRAN version
devtools::install_github("zdebruine/RcppML") # compile dev version
NOTE: RcppML is being actively developed. Please check that your packageVersion("RcppML")
is current before raising issues.
Check out the CRAN manual.
Once installed and loaded, RcppML C++ headers defining classes can be used in C++ files for any R package using #include <RcppML.hpp>
.
Sparse matrix factorization by alternating least squares: Non-negativity constraints L1 regularization Diagonal scaling Rank-1 and Rank-2 specializations (~2x faster than irlba SVD equivalents)
Read (and cite) our bioRXiv manuscript on NMF for single-cell experiments.
The nmf
function runs matrix factorization by alternating least squares in the form A = WDH
. The project
function updates w
or h
given the other, while the mse
function calculates mean squared error of the factor model.
library(RcppML)
A <- Matrix::rsparsematrix(1000, 100, 0.1) # sparse Matrix::dgCMatrix
model <- RcppML::nmf(A, k = 10)
h0 <- predict(model, A)
evaluate(model, A) # calculate mean squared error
Divisive clustering by rank-2 spectral bipartitioning. 2nd SVD vector is linearly related to the difference between factors in rank-2 matrix factorization. Rank-2 matrix factorization (optional non-negativity constraints) for spectral bipartitioning ~2x faster than irlba SVD Sensitive distance-based stopping criteria similar to Newman-Girvan modularity, but orders of magnitude faster Stopping criteria based on minimum number of samples
The dclust
function runs divisive clustering by recursive spectral bipartitioning, while the bipartition
function exposes the rank-2 NMF specialization and returns statistics of the bipartition.
library(RcppML)
A <- Matrix::rsparsematrix(1000, 1000, 0.1) # sparse Matrix::dgcMatrix
clusters <- dclust(A, min_dist = 0.001, min_samples = 5)
cluster0 <- bipartition(A)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.