lsmf: Least squares matrix factorization
In zdebruine/LSMF: Least squares matrix factorization

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/lsmf.R

Fast and versatile sparse matrix factorization with one- or two-sided least squares constraints, L1/L2/angular regularization, distributional weighting, binary zero-masking, parallelization, and multiple initialization options

lsmf(
  A,
  k,
  lower.limits = c(0, 0),
  upper.limits = c(NULL, NULL),
  constraint.type = list("continuous", "continuous"),
  alpha.L1 = c(0, 0),
  alpha.L2 = c(0, 0),
  alpha.angular = c(0, 0),
  alpha.elasticnet = c(NULL, NULL),
  alpha.L0 = c(k, k),
  mask.zeros = FALSE,
  weights = NULL,
  sample.weights = NULL,
  feature.weights = NULL,
  n.threads = 0,
  init = "random",
  seed = 123,
  start.W = TRUE,
  tol = 1e-04,
  scd.tol = 1e-08,
  maxit = 10000,
  scd.maxit = 100,
  trace = 1,
  adaptive.constraint = TRUE,
  adaptive.bernoulli.steps = 1,
  adaptive.alpha = TRUE,
  adaptive.tol = 0.01,
  adaptive.maxit = 10,
  adaptive.trace = 3,
  adaptive.L1.step = 0.1,
  adaptive.angular.step = 0.1,
  adaptive.L0.step = 1,
  return.sparse = c(FALSE, FALSE),
  verbose = TRUE,
  show.warnings = TRUE
)

`A`	matrix of "features x samples" as "rows x columns" of or coercible to sparse dgCMatrix
`k`	rank
`lower.limits`	Lower limits for coefficients. Specify NULL, a single numeric, or an array of two for c(W, H), default of c(0, 0) is equivalent to NMF
`upper.limits`	Upper limits for coefficients. Specify NULL, a single numeric or an array of two for c(W, H)
`constraint.type`	List of two for list(W,H) with list components of values c("continuous", "bernoulli", "bernoulli.signed", or a multinomial array of values to constrain to). Default is list("continuous", "continuous"). If not "continuous", "lower.limits" and "upper.limits" are disregarded.
`alpha.L1`	lasso regularization on W and H, a single numeric or an array of two for c(W, H) typically in the range [0,1]
`alpha.L2`	ridge regularization on W and H, a single numeric or an array of two for c(W, H) typically in the range [0,1]
`alpha.angular`	angular regularization on W and H, a single numeric or an array of two for c(W, H) in the range [0,1]
`alpha.elasticnet`	elastic net mixing parameter for ((1-alpha) * L2 + alpha * L1) regularization on W and H, array of two for c(W, H). Default is c(NULL, NULL). Supercedes alpha.L1, alpha.L2
`alpha.L0`	cardinality of non-zero coefficients in each row of W or column of H, specify a single nummeric or an array of two for c(W, H), default c(k, k) results in no penalization
`mask.zeros`	regard zeros as missing/masked values
`weights`	matrix of same dimensions as "A" of or coercible to dgCMatrix with weights for model learning in the range [0,1] where 0 = completely masked and 1 = completely weighted
`sample.weights`	vector of length ncol(A) with values in the range [0,1]
`feature.weights`	vector of length ncrow(A) with values in the range [0,1]
`n.threads`	number of threads/CPUs to use, or leave 0 or NULL to let OpenMP decide
`init`	initialization for W and H, one of c("random", "nndsvd", "nndsvda", "nndsvdar") or a list of two matrices of class "matrix" as initializations for list(W, H). See `LSMF::nndsvd` and `LSMF::rand.init` for methodological details. Random initialization handles proper initialization for `constraint.type` other than "continuous" while nndsvd methods cannot
`seed`	seed for init = "random" initialization (default NULL)
`start.W`	should alternating least squares start with updating W, as is default, or with H
`tol`	stopping criterion based on relative error between two successive iterations: \|e2-e1\|/mean(e1,e2)
`scd.tol`	stopping criterion for sequential coordinate descent least squares solver between two successive iterations
`maxit`	permitted number of alternating least squares iterations if "tol" criteria remains unsatisfied
`scd.maxit`	permitted number of sequential coordinate descent iterations if "scd.tol" criteria remains unsatisfied
`trace`	check error for convergence every trace iterations (checking error is costly). To not check error at all and simply execute `maxit` iterations, set trace to NULL or 0
`adaptive.constraint`	gradually impose constraints? applies if `constraint.type != "continuous"`
`adaptive.bernoulli.steps`	applies only if `constraint.type = list("continuous", "bernoulli")`, how many factorizations to perform on multinomial distributions between c(0.5, 0.5) and c(0, 1). For example, 1 step corresponds to factorizing with a multinomial distribution on H of c(0.25,0.75), 4 steps corresponds to factorizing with a multinomial distribution on H of c(0.4, 0.6) ... to (0.1, 0.9)
`adaptive.alpha`	gradually introduce alpha penalties? (does not apply to `alpha.L2`)
`adaptive.tol`	`tol` for convergence in each adaptive update step. Also applies to `adaptive.constraint`
`adaptive.maxit`	`maxit` for each adaptive update step
`adaptive.trace`	`trace` for each adaptive update step. To not check error in adaptive updating steps and simply execute `maxit` iterations, set `adaptive.trace = adaptive.maxit`
`adaptive.L1.step`	double, for stepwise introduction of L1 penalty, how much penalty to increment with each update
`adaptive.angular.step`	double, for stepwise introduction of angular penalty, how much penalty to increment with each update
`adaptive.L0.step`	integer, for stepwise introduction of L0 penalty, how many more values to set to 0 in each update
`return.sparse`	convert W and/or H to dgCMatrix. Boolean or array of boolean for c(W, H)
`verbose`	boolean
`show.warnings`	Display warnings from checks for improper or questionable usage of input parameters.

LSMF runs matrix factorization using alternating least squares with an efficient least squares coordinate descent solver. Matrix operations make use of the Armadillo linear algebra library in a parallelized C++ backend. Sparse matrix support for non-zero-masked factorizations means the entire matrix does not need to be transferred into memory.

Basic usage is documented below. For details see the vignette and the references section. To report issues or seek case-by-case optimization, open a GitHub issue.

—————————————

NON-NEGATIVE MATRIX FACTORIZATION

Set lower.limits = 0 and specify at minimum A and k

When to use: data with meaningfully non-negative counts

—————————————

ONE-SIDED NMF (OSNMF-R or OSNMF-L)

Set lower.limits = c(1, 0) for OSNMF-left or lower.limits = c(0, 1) for OSNMF-right

When to use: clustering, graphing

—————————————

SYMMETRIC NMF

Set alpha.L2 = c(1, 0) and any other parameters for W and H equal to the same value for W and H. Do not specify a value for alpha.elasticnet
An L2 penalty of 1 causes W to converge towards H such that the slope between factors in W and H is 1. As W converges towards H, H counter-converges towards the values which give the lowest MSE
The lower tol is set (i.e. 1e-5), the better the linearity between W and H
If signal is random or very fuzzy, the linear fit may never converge
If needed, calculate mean(W, t(H)) as an approximation of the "perfect" factorization

When to use: Whenever factorizing a symmetric dataset, always set alpha.L2 = c(1, 0). The only exception that occurs to mind is OSNMF/R for clustering on H in a rank-2 factorization.

—————————————

ZERO-MASKED NMF

Set mask.zeros = TRUE
If any of weights != NULL, sample.weights != NULL, feature.weights != NULL, the sparse matrix computation advantage is lost even though zeros will be assigned a weight of 0 (completely masked).
While ZMNMF is significantly faster, distribution-based weighting should be prefered in most use cases.

When to use: Imputation on zero-inflated data that is naturally dense, extremely sparse sparse and robust signals

—————————————

LASSO REGULARIZATION

Set alpha.L1 = c([0,n], [0,n]) for W, H. Typically a value > 0 and < 1
Sparsity is enforced on the model as a whole (W, H, or both), not on individual factors
Over-penalization readily encourages a few dense factors and many extremely sparse factors
Cross-validation against an experimental objective is important.
Some samples or features may receive entirely all zero coefficients.

When to use:

Extract robust factors from data with high dropout and noisy signal, where robustness is not achieved with default MF
Introduce sparsity in rank-1 or rank-2 factorization for soft clustering

—————————————

RIDGE REGULARIZATION

Set alpha.L2 = c([0,n],[0,n]) for W, H. Typically a value > 0 and < 1.
In matrix factorization, ridge regularization encourages the distribution of values in W and H to overlap.
Optimal results are achieved when one of W or H, generally the first to be initialized (W by default) is set to 1 and the other is set to 0 (i.e. alpha.L2 = c(1, 0))

When to use (given alpha.L2 = c(1, 0) when start.W = TRUE):

Symmetric data should almost always be factorized with alpha.L2 = c(1, 0)
Learning factor models across independent experiments which can be compared without the need for normalization or transformation (i.e. the contributions of sample weights and feature weights are known to be 1:1)

—————————————

RANK-1 MATRIX FACTORIZATION

Set k = 1
Note that non-negative count data will never give negative values in "W" or "H" even if lower.limits = c(-Inf, -Inf).
Over- or under-determined rank-k on W and rank-1 on H factorizations may be generated by first solving a rank-k factorization, and then a projection for H with k = 1 using LSMF::ls.project.

When to use: Divisive clustering, diffusion coordinate assignment, trajectory mapping

—————————————

Methods in LSMF are built on top of a computational framework for ANNLS NMF refined and published by Xihui "Eric" Lin in the NNLM R package. LSMF specifically uses alternating non-negative least squares with coordinate descent least squares (SCD method) against a mean squared error loss (MSE loss).

A list of:

W : feature coefficients, class "matrix"
H : sample coefficients, class "matrix"
W.init : initialization used for W
H.init : initialization used for H
weights : weights used (if applicable)
mse : mean squared error with applicable penalties and weights
rel.err : error of last iteration relative to previous iteration
call : list of interpreted parameter values

Zach DeBruine, zach.debruine@vai.org

Lin, X., and Boutros, P.C. (2020). "Optimization and expansion of non-negative matrix factorization." BMC Bioinformatics.

Kuang, D., Yun, S., and Park, H. (2015). "SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering." Journal of Global Optimization.

Lee and Seung

Alternating least squares factorization

ls.project, mse

## Not run: 
data(moca7k)
# run lsmf with default parameters
model <- lsmf(moca7k, k = 10, trace = 5)

# with L1 regularization on W and a larger tolerance
model <- lsmf(moca7k, k = 10, alpha = c(0,0,0.5), trace = 5, tol = 1e-2)

# prepare a similarity of first 1000 samples
#   in moca7k dataset, and induce 75% dropout
library(Matrix)
data(moca7k)
m.init <- as(sparse.cos(moca7k[,1:1000]), "dgCMatrix")
m <- m.init * rsparsematrix(1000, 1000, 0.25, symmetric = TRUE, rand.x = NULL)
diag(m) <- 1

# run lsmf.sym with default parameters
model <- lsmf.sym(m, k = 10)

# compare similarity between W.mean and W
mean(diag(cor(model$W, model$W.mean)))
# [1] 0.994232

# plot W against t(H)
plot(model$W, t(model$H))
# plot looks pretty good, but notice shift of low weighted values towards W, 
#  this is because W is initialized first.
#  lowering the tolerance removes that shift, for instance,
model2 <- lsmf.sym(m, k = 10, tol = 1e-6)
plot(model2$W, t(model2$H))
# basically a perfect fit!

# let's factorize it again and see if we get the same result, 
#   note that factor orderings may be scrambled
model3 <- lsmf.sym(m, k = 10, tol = 1e-6)
model3.reordered <- match.factors(model2$W.mean, model3$W.mean)$m.aligned
# plot first matched pair
plot(model3.reordered, model2$W.mean)
# in general there is decent agreement, but multiple factorizations 
#   would be useful in finding a robust minima
# note that some factors are highly robust while others are not very robust
plot(diag(cor(model3.reordered, model2$W.mean)))

# take for instance the best fit vector or the worst fit vector and compare them
best.pair <- which.max(diag(cor(model3.reordered, model2$W.mean)))
worst.pair <- which.min(diag(cor(model3.reordered, model2$W.mean)))
plot(model3.reordered[, best.pair], model2$W.mean[, best.pair])
plot(model3.reordered[, worst.pair], model2$W.mean[, worst.pair])

# ZERO-MASKING
# Examine zero-masking for imputing signal by comparing "m" to "m.init",
#  since signal dropout in "m" is 75%
model.orig <- lsmf.sym(m.init, k = 10, tol = 1e-6)
model.zm <- lsmf.sym(m, k = 10, tol = 1e-6, mask.zeros = TRUE)
model.nzm <- lsmf.sym(m, k = 10, tol = 1e-6, mask.zeros = FALSE)

# compare model.zm and model.nzm to model.orig, after matching
W.orig <- model.orig$W
W.zm <- match.factors(W.orig, model.zm$W)$m.aligned
W.nzm <- match.factors(W.orig, model.nzm$W)$m.aligned
plot(W.orig, W.zm)
plot(W.orig, W.nzm)
# zero-masking imputes signal dropout excellently!

## End(Not run)