nmf | R Documentation |
High-performance NMF of the form A = wdh
for large dense or sparse matrices, returns an object of class nmf
.
nmf(
data,
k,
tol = 1e-04,
maxit = 100,
L1 = c(0, 0),
L2 = c(0, 0),
seed = NULL,
mask = NULL,
...
)
data |
dense or sparse matrix of features in rows and samples in columns. Prefer |
k |
rank |
tol |
tolerance of the fit |
maxit |
maximum number of fitting iterations |
L1 |
LASSO penalties in the range (0, 1], single value or array of length two for |
L2 |
Ridge penalties greater than zero, single value or array of length two for |
seed |
single initialization seed or array, or a matrix or list of matrices giving initial |
mask |
dense or sparse matrix of values in |
... |
development parameters |
This fast NMF implementation decomposes a matrix A
into lower-rank non-negative matrices w
and h
,
with columns of w
and rows of h
scaled to sum to 1 via multiplication by a diagonal, d
:
A = wdh
The scaling diagonal ensures convex L1 regularization, consistent factor scalings regardless of random initialization, and model symmetry in factorizations of symmetric matrices.
The factorization model is randomly initialized. w
and h
are updated by alternating least squares.
RcppML achieves high performance using the Eigen C++ linear algebra library, OpenMP parallelization, a dedicated Rcpp sparse matrix class, and fast sequential coordinate descent non-negative least squares initialized by Cholesky least squares solutions.
Sparse optimization is automatically applied if the input matrix A
is a sparse matrix (i.e. Matrix::dgCMatrix
). There are also specialized back-ends for symmetric, rank-1, and rank-2 factorizations.
L1 penalization can be used for increasing the sparsity of factors and assisting interpretability. Penalty values should range from 0 to 1, where 1 gives complete sparsity.
Set options(RcppML.verbose = TRUE)
to print model tolerances to the console after each iteration.
Parallelization is applied with OpenMP using the number of threads in getOption("RcppML.threads")
and set by option(RcppML.threads = 0)
, for example. 0
corresponds to all threads, let OpenMP decide.
object of class nmf
w
feature factor matrix
d
scaling diagonal vector
h
sample factor matrix
misc
list often containing components:
tol : tolerance of fit
iter : number of fitting updates
runtime : runtime in seconds
mse : mean squared error of model (calculated for multiple starts only)
w_init : initial w matrix used for model fitting
S4 methods available for the nmf
class:
predict
: project an NMF model (or partial model) onto new samples
evaluate
: calculate mean squared error loss of an NMF model
summary
: data.frame
giving fractional
, total
, or mean
representation of factors in samples or features grouped by some criteria
align
: find an ordering of factors in one nmf
model that best matches those in another nmf
model
prod
: compute the dense approximation of input data
sparsity
: compute the sparsity of each factor in w
and h
subset
: subset, reorder, select, or extract factors (same as [
)
generics such as dim
, dimnames
, t
, show
, head
Zach DeBruine
DeBruine, ZJ, Melcher, K, and Triche, TJ. (2021). "High-performance non-negative matrix factorization for large single-cell data." BioRXiv.
Lin, X, and Boutros, PC (2020). "Optimization and expansion of non-negative matrix factorization." BMC Bioinformatics.
Lee, D, and Seung, HS (1999). "Learning the parts of objects by non-negative matrix factorization." Nature.
Franc, VC, Hlavac, VC, Navara, M. (2005). "Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem". Proc. Int'l Conf. Computer Analysis of Images and Patterns. Lecture Notes in Computer Science.
project
, mse
, nnls
## Not run:
# basic NMF
model <- nmf(rsparsematrix(1000, 100, 0.1), k = 10)
# compare rank-2 NMF to second left vector in an SVD
data(iris)
A <- Matrix::as(as.matrix(iris[, 1:4]), "dgCMatrix")
nmf_model <- nmf(A, 2, tol = 1e-5)
bipartitioning_vector <- apply(nmf_model$w, 1, diff)
second_left_svd_vector <- base::svd(A, 2)$u[, 2]
abs(cor(bipartitioning_vector, second_left_svd_vector))
# compare rank-1 NMF with first singular vector in an SVD
abs(cor(nmf(A, 1)$w[, 1], base::svd(A, 2)$u[, 1]))
# symmetric NMF
A <- crossprod(rsparsematrix(100, 100, 0.02))
model <- nmf(A, 10, tol = 1e-5, maxit = 1000)
plot(model$w, t(model$h))
# see package vignette for more examples
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.