nmf: Non-negative matrix factorization
In zdebruine/RcppML: Rcpp Machine Learning Library

View source: R/nmf.R

nmf	R Documentation

Non-negative matrix factorization

Description

High-performance NMF of the form A = wdh for large dense or sparse matrices, returns an object of class nmf.

Usage

nmf(
  data,
  k,
  tol = 1e-04,
  maxit = 100,
  L1 = c(0, 0),
  L2 = c(0, 0),
  seed = NULL,
  mask = NULL,
  ...
)

Arguments

`data`	dense or sparse matrix of features in rows and samples in columns. Prefer `matrix` or `Matrix::dgCMatrix`, respectively
`k`	rank
`tol`	tolerance of the fit
`maxit`	maximum number of fitting iterations
`L1`	LASSO penalties in the range (0, 1], single value or array of length two for `c(w, h)`
`L2`	Ridge penalties greater than zero, single value or array of length two for `c(w, h)`
`seed`	single initialization seed or array, or a matrix or list of matrices giving initial `w`. For multiple initializations, the model with least mean squared error is returned.
`mask`	dense or sparse matrix of values in `data` to handle as missing. Prefer `Matrix::dgCMatrix`. Alternatively, specify "`zeros`" or "`NA`" to mask either all zeros or NA values.
`...`	development parameters

Details

This fast NMF implementation decomposes a matrix A into lower-rank non-negative matrices w and h, with columns of w and rows of h scaled to sum to 1 via multiplication by a diagonal, d:

A = wdh

The scaling diagonal ensures convex L1 regularization, consistent factor scalings regardless of random initialization, and model symmetry in factorizations of symmetric matrices.

The factorization model is randomly initialized. w and h are updated by alternating least squares.

RcppML achieves high performance using the Eigen C++ linear algebra library, OpenMP parallelization, a dedicated Rcpp sparse matrix class, and fast sequential coordinate descent non-negative least squares initialized by Cholesky least squares solutions.

Sparse optimization is automatically applied if the input matrix A is a sparse matrix (i.e. Matrix::dgCMatrix). There are also specialized back-ends for symmetric, rank-1, and rank-2 factorizations.

L1 penalization can be used for increasing the sparsity of factors and assisting interpretability. Penalty values should range from 0 to 1, where 1 gives complete sparsity.

Set options(RcppML.verbose = TRUE) to print model tolerances to the console after each iteration.

Parallelization is applied with OpenMP using the number of threads in getOption("RcppML.threads") and set by option(RcppML.threads = 0), for example. 0 corresponds to all threads, let OpenMP decide.

Value

object of class nmf

Slots

w

feature factor matrix

d

scaling diagonal vector

h

sample factor matrix

misc

list often containing components:

tol : tolerance of fit
iter : number of fitting updates
runtime : runtime in seconds
mse : mean squared error of model (calculated for multiple starts only)
w_init : initial w matrix used for model fitting

Methods

S4 methods available for the nmf class:

predict: project an NMF model (or partial model) onto new samples
evaluate: calculate mean squared error loss of an NMF model
summary: data.frame giving fractional, total, or mean representation of factors in samples or features grouped by some criteria
align: find an ordering of factors in one nmf model that best matches those in another nmf model
prod: compute the dense approximation of input data
sparsity: compute the sparsity of each factor in w and h
subset: subset, reorder, select, or extract factors (same as [)
generics such as dim, dimnames, t, show, head

Author(s)

Zach DeBruine

References

DeBruine, ZJ, Melcher, K, and Triche, TJ. (2021). "High-performance non-negative matrix factorization for large single-cell data." BioRXiv.

Lin, X, and Boutros, PC (2020). "Optimization and expansion of non-negative matrix factorization." BMC Bioinformatics.

Lee, D, and Seung, HS (1999). "Learning the parts of objects by non-negative matrix factorization." Nature.

Franc, VC, Hlavac, VC, Navara, M. (2005). "Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem". Proc. Int'l Conf. Computer Analysis of Images and Patterns. Lecture Notes in Computer Science.

Examples

## Not run: 
# basic NMF
model <- nmf(rsparsematrix(1000, 100, 0.1), k = 10)

# compare rank-2 NMF to second left vector in an SVD
data(iris)
A <- Matrix::as(as.matrix(iris[, 1:4]), "dgCMatrix")
nmf_model <- nmf(A, 2, tol = 1e-5)
bipartitioning_vector <- apply(nmf_model$w, 1, diff)
second_left_svd_vector <- base::svd(A, 2)$u[, 2]
abs(cor(bipartitioning_vector, second_left_svd_vector))

# compare rank-1 NMF with first singular vector in an SVD
abs(cor(nmf(A, 1)$w[, 1], base::svd(A, 2)$u[, 1]))

# symmetric NMF
A <- crossprod(rsparsematrix(100, 100, 0.02))
model <- nmf(A, 10, tol = 1e-5, maxit = 1000)
plot(model$w, t(model$h))
# see package vignette for more examples

## End(Not run)

zdebruine/RcppML documentation built on Sept. 13, 2023, 11:44 p.m.