Description Usage Arguments Details Value Author(s) References See Also Examples

Solves the equation *A = wh* for either *h* or *w* given either *w* or *h* and *A*

1 |

`A` |
matrix of features-by-samples in dense or sparse format (preferred classes are "matrix" or "Matrix::dgCMatrix", respectively). Prefer sparse storage when more than half of all values are zero. |

`w` |
dense matrix of factors x features giving the linear model to be projected (if |

`h` |
dense matrix of factors x samples giving the linear model to be projected (if |

`nonneg` |
enforce non-negativity |

`L1` |
L1/LASSO penalty to be applied. No scaling is performed. See details. |

`mask_zeros` |
handle zeros as missing values, available only when |

For the classical alternating least squares matrix factorization update problem *A = wh*, the updates
(or projection) of *h* is given by the equation:

*w^Twh = wA_j*

which is in the form *ax = b* where *a = w^Tw* *x = h* and *b = wA_j* for all columns *j* in *A*.

Given *A*, `project`

can solve for either *w* or *h* given the other:

When given

*A*and*w*,*h*is found using a highly efficient parallelization scheme.When given

*A*and*h*,*w*is found without transposition of*A*(as would be the case in traditional block-pivoting matrix factorization) by accumulating the right-hand sides of linear systems in-place in*A*, then solving the systems. Note that*w*may also be found by inputting the transpose of*A*and*h*in place of*w*, (i.e.`A = t(A), w = h, h = NULL`

). However, for most applications, the cost of a single projection in-place is less than transposition of*A*. However, for matrix factorization, it is desirable to transpose*A*if possible because many projections are needed.

**Parallelization.** Least squares projections in factorizations of rank-3 and greater are parallelized using the number of threads set by `setRcppMLthreads`

.
By default, all available threads are used, see `getRcppMLthreads`

. The overhead of parallization is too great for rank-1 and -2 factorization.

**L1 Regularization.** Any L1 penalty is subtracted from *b* and should generally be scaled to `max(b)`

, where *b = WA_j* for all columns *j* in *A*. An easy way to properly scale an L1 penalty is to normalize all columns in *w* to sum to 1. No scaling is applied in this function. Such scaling guarantees that `L1 = 1`

gives a completely sparse solution.

**Specializations.** There are specializations for symmetric input matrices, and for rank-1 and rank-2 projections. See documentation for `nmf`

for theoretical details and guidance.

**Publication reference.** For theoretical and practical considerations, please see our manuscript: "DeBruine ZJ, Melcher K, Triche TJ (2021)
High-performance non-negative matrix factorization for large single cell data." on BioRXiv.

matrix *h* or *w*

Zach DeBruine

DeBruine, ZJ, Melcher, K, and Triche, TJ. (2021). "High-performance non-negative matrix factorization for large single-cell data." BioRXiv.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
## Not run:
library(Matrix)
w <- matrix(runif(1000 * 10), 1000, 10)
h_true <- matrix(runif(10 * 100), 10, 100)
# A is the crossproduct of "w" and "h" with 10% signal dropout
A <- (w %*% h_true) * (rsparsematrix(1000, 100, 0.9) > 0)
h <- project(A, w)
cor(as.vector(h_true), as.vector(h))
# alternating projections refine solution (like NMF)
mse_bad <- mse(A, w, rep(1, ncol(w)), h) # mse before alternating updates
h <- project(A, w = w)
w <- project(A, h = h)
h <- project(A, w)
w <- project(A, h = h)
h <- project(A, w)
w <- t(project(A, h = h))
mse_better <- mse(A, w, rep(1, ncol(w)), h) # mse after alternating updates
mse_better < mse_bad
# two ways to solve for "w" that give the same solution
w <- project(A, h = h)
w2 <- project(t(A), w = t(h))
all.equal(w, w2)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.