lsmf: Least squares matrix factorization

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/lsmf.R

Description

Fast and versatile sparse matrix factorization with one- or two-sided least squares constraints, L1/L2/angular regularization, distributional weighting, binary zero-masking, parallelization, and multiple initialization options

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
lsmf(
  A,
  k,
  lower.limits = c(0, 0),
  upper.limits = c(NULL, NULL),
  constraint.type = list("continuous", "continuous"),
  alpha.L1 = c(0, 0),
  alpha.L2 = c(0, 0),
  alpha.angular = c(0, 0),
  alpha.elasticnet = c(NULL, NULL),
  alpha.L0 = c(k, k),
  mask.zeros = FALSE,
  weights = NULL,
  sample.weights = NULL,
  feature.weights = NULL,
  n.threads = 0,
  init = "random",
  seed = 123,
  start.W = TRUE,
  tol = 1e-04,
  scd.tol = 1e-08,
  maxit = 10000,
  scd.maxit = 100,
  trace = 1,
  adaptive.constraint = TRUE,
  adaptive.bernoulli.steps = 1,
  adaptive.alpha = TRUE,
  adaptive.tol = 0.01,
  adaptive.maxit = 10,
  adaptive.trace = 3,
  adaptive.L1.step = 0.1,
  adaptive.angular.step = 0.1,
  adaptive.L0.step = 1,
  return.sparse = c(FALSE, FALSE),
  verbose = TRUE,
  show.warnings = TRUE
)

Arguments

A

matrix of "features x samples" as "rows x columns" of or coercible to sparse dgCMatrix

k

rank

lower.limits

Lower limits for coefficients. Specify NULL, a single numeric, or an array of two for c(W, H), default of c(0, 0) is equivalent to NMF

upper.limits

Upper limits for coefficients. Specify NULL, a single numeric or an array of two for c(W, H)

constraint.type

List of two for list(W,H) with list components of values c("continuous", "bernoulli", "bernoulli.signed", or a multinomial array of values to constrain to). Default is list("continuous", "continuous"). If not "continuous", "lower.limits" and "upper.limits" are disregarded.

alpha.L1

lasso regularization on W and H, a single numeric or an array of two for c(W, H) typically in the range [0,1]

alpha.L2

ridge regularization on W and H, a single numeric or an array of two for c(W, H) typically in the range [0,1]

alpha.angular

angular regularization on W and H, a single numeric or an array of two for c(W, H) in the range [0,1]

alpha.elasticnet

elastic net mixing parameter for ((1-alpha) * L2 + alpha * L1) regularization on W and H, array of two for c(W, H). Default is c(NULL, NULL). Supercedes alpha.L1, alpha.L2

alpha.L0

cardinality of non-zero coefficients in each row of W or column of H, specify a single nummeric or an array of two for c(W, H), default c(k, k) results in no penalization

mask.zeros

regard zeros as missing/masked values

weights

matrix of same dimensions as "A" of or coercible to dgCMatrix with weights for model learning in the range [0,1] where 0 = completely masked and 1 = completely weighted

sample.weights

vector of length ncol(A) with values in the range [0,1]

feature.weights

vector of length ncrow(A) with values in the range [0,1]

n.threads

number of threads/CPUs to use, or leave 0 or NULL to let OpenMP decide

init

initialization for W and H, one of c("random", "nndsvd", "nndsvda", "nndsvdar") or a list of two matrices of class "matrix" as initializations for list(W, H). See LSMF::nndsvd and LSMF::rand.init for methodological details. Random initialization handles proper initialization for constraint.type other than "continuous" while nndsvd methods cannot

seed

seed for init = "random" initialization (default NULL)

start.W

should alternating least squares start with updating W, as is default, or with H

tol

stopping criterion based on relative error between two successive iterations: |e2-e1|/mean(e1,e2)

scd.tol

stopping criterion for sequential coordinate descent least squares solver between two successive iterations

maxit

permitted number of alternating least squares iterations if "tol" criteria remains unsatisfied

scd.maxit

permitted number of sequential coordinate descent iterations if "scd.tol" criteria remains unsatisfied

trace

check error for convergence every trace iterations (checking error is costly). To not check error at all and simply execute maxit iterations, set trace to NULL or 0

adaptive.constraint

gradually impose constraints? applies if constraint.type != "continuous"

adaptive.bernoulli.steps

applies only if constraint.type = list("continuous", "bernoulli"), how many factorizations to perform on multinomial distributions between c(0.5, 0.5) and c(0, 1). For example, 1 step corresponds to factorizing with a multinomial distribution on H of c(0.25,0.75), 4 steps corresponds to factorizing with a multinomial distribution on H of c(0.4, 0.6) ... to (0.1, 0.9)

adaptive.alpha

gradually introduce alpha penalties? (does not apply to alpha.L2)

adaptive.tol

tol for convergence in each adaptive update step. Also applies to adaptive.constraint

adaptive.maxit

maxit for each adaptive update step

adaptive.trace

trace for each adaptive update step. To not check error in adaptive updating steps and simply execute maxit iterations, set adaptive.trace = adaptive.maxit

adaptive.L1.step

double, for stepwise introduction of L1 penalty, how much penalty to increment with each update

adaptive.angular.step

double, for stepwise introduction of angular penalty, how much penalty to increment with each update

adaptive.L0.step

integer, for stepwise introduction of L0 penalty, how many more values to set to 0 in each update

return.sparse

convert W and/or H to dgCMatrix. Boolean or array of boolean for c(W, H)

verbose

boolean

show.warnings

Display warnings from checks for improper or questionable usage of input parameters.

Details

LSMF runs matrix factorization using alternating least squares with an efficient least squares coordinate descent solver. Matrix operations make use of the Armadillo linear algebra library in a parallelized C++ backend. Sparse matrix support for non-zero-masked factorizations means the entire matrix does not need to be transferred into memory.

Basic usage is documented below. For details see the vignette and the references section. To report issues or seek case-by-case optimization, open a GitHub issue.

—————————————

NON-NEGATIVE MATRIX FACTORIZATION

When to use: data with meaningfully non-negative counts

—————————————

ONE-SIDED NMF (OSNMF-R or OSNMF-L)

When to use: clustering, graphing

—————————————

SYMMETRIC NMF

When to use: Whenever factorizing a symmetric dataset, always set alpha.L2 = c(1, 0). The only exception that occurs to mind is OSNMF/R for clustering on H in a rank-2 factorization.

—————————————

ZERO-MASKED NMF

When to use: Imputation on zero-inflated data that is naturally dense, extremely sparse sparse and robust signals

—————————————

LASSO REGULARIZATION

When to use:

—————————————

RIDGE REGULARIZATION

When to use (given alpha.L2 = c(1, 0) when start.W = TRUE):

—————————————

RANK-1 MATRIX FACTORIZATION

When to use: Divisive clustering, diffusion coordinate assignment, trajectory mapping

—————————————

Methods in LSMF are built on top of a computational framework for ANNLS NMF refined and published by Xihui "Eric" Lin in the NNLM R package. LSMF specifically uses alternating non-negative least squares with coordinate descent least squares (SCD method) against a mean squared error loss (MSE loss).

Value

A list of:

Author(s)

Zach DeBruine, zach.debruine@vai.org

References

Lin, X., and Boutros, P.C. (2020). "Optimization and expansion of non-negative matrix factorization." BMC Bioinformatics.

Kuang, D., Yun, S., and Park, H. (2015). "SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering." Journal of Global Optimization.

Lee and Seung

Alternating least squares factorization

See Also

ls.project, mse

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
## Not run: 
data(moca7k)
# run lsmf with default parameters
model <- lsmf(moca7k, k = 10, trace = 5)

# with L1 regularization on W and a larger tolerance
model <- lsmf(moca7k, k = 10, alpha = c(0,0,0.5), trace = 5, tol = 1e-2)

# prepare a similarity of first 1000 samples
#   in moca7k dataset, and induce 75% dropout
library(Matrix)
data(moca7k)
m.init <- as(sparse.cos(moca7k[,1:1000]), "dgCMatrix")
m <- m.init * rsparsematrix(1000, 1000, 0.25, symmetric = TRUE, rand.x = NULL)
diag(m) <- 1

# run lsmf.sym with default parameters
model <- lsmf.sym(m, k = 10)

# compare similarity between W.mean and W
mean(diag(cor(model$W, model$W.mean)))
# [1] 0.994232

# plot W against t(H)
plot(model$W, t(model$H))
# plot looks pretty good, but notice shift of low weighted values towards W, 
#  this is because W is initialized first.
#  lowering the tolerance removes that shift, for instance,
model2 <- lsmf.sym(m, k = 10, tol = 1e-6)
plot(model2$W, t(model2$H))
# basically a perfect fit!

# let's factorize it again and see if we get the same result, 
#   note that factor orderings may be scrambled
model3 <- lsmf.sym(m, k = 10, tol = 1e-6)
model3.reordered <- match.factors(model2$W.mean, model3$W.mean)$m.aligned
# plot first matched pair
plot(model3.reordered, model2$W.mean)
# in general there is decent agreement, but multiple factorizations 
#   would be useful in finding a robust minima
# note that some factors are highly robust while others are not very robust
plot(diag(cor(model3.reordered, model2$W.mean)))

# take for instance the best fit vector or the worst fit vector and compare them
best.pair <- which.max(diag(cor(model3.reordered, model2$W.mean)))
worst.pair <- which.min(diag(cor(model3.reordered, model2$W.mean)))
plot(model3.reordered[, best.pair], model2$W.mean[, best.pair])
plot(model3.reordered[, worst.pair], model2$W.mean[, worst.pair])

# ZERO-MASKING
# Examine zero-masking for imputing signal by comparing "m" to "m.init",
#  since signal dropout in "m" is 75%
model.orig <- lsmf.sym(m.init, k = 10, tol = 1e-6)
model.zm <- lsmf.sym(m, k = 10, tol = 1e-6, mask.zeros = TRUE)
model.nzm <- lsmf.sym(m, k = 10, tol = 1e-6, mask.zeros = FALSE)

# compare model.zm and model.nzm to model.orig, after matching
W.orig <- model.orig$W
W.zm <- match.factors(W.orig, model.zm$W)$m.aligned
W.nzm <- match.factors(W.orig, model.nzm$W)$m.aligned
plot(W.orig, W.zm)
plot(W.orig, W.nzm)
# zero-masking imputes signal dropout excellently!

## End(Not run)

zdebruine/LSMF documentation built on Jan. 1, 2021, 1:50 p.m.