nnmf: Non-negative matrix factorization

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Non-negative matrix factorization(NMF or NNMF) using sequential coordinate-wise descent or multiplicative updates

Usage

1
2
3
4
5
6
nnmf(A, k = 1L, alpha = rep(0, 3), beta = rep(0, 3),
  method = c("scd", "lee"), loss = c("mse", "mkl"), init = NULL,
  mask = NULL, W.norm = -1L, check.k = TRUE, max.iter = 500L,
  rel.tol = 1e-04, n.threads = 1L, trace = 100/inner.max.iter,
  verbose = 1L, show.warning = TRUE, inner.max.iter = ifelse("mse" ==
  loss, 50L, 1L), inner.rel.tol = 1e-09)

Arguments

A

A matrix to be decomposed

k

An integer of decomposition rank

alpha

[L2, angle, L1] regularization on W (non-masked entries)

beta

[L2, angle, L1] regularization on H (non-masked entries)

method

Decomposition algorithms, either 'scd' for sequential coordinate-wise descent(default) or 'lee' for Lee's multiplicative algorithm

loss

Loss function to use, either 'mse' for mean square error (default) or 'mkl' for mean KL-divergence

init

A list of initial matrices for W and H. One can also supply known matrices W0, H0, and initialize their correspondent matrices H1 and W1. See details

mask

A list of mask matrices for W, H, H1 (if init$W0 supplied), W1 (if init$H0 supplied), which should have the same shapes as W, H, H1 and W1 if specified. If initial matrices not specified, masked entries are fixed to 0. See details

W.norm

A numeric value 'p' indicating the L_p-norm (can be infinity) used to normalized the outcome W matrix. No normalization will be performed if 0 or negative. This argument has no effect on outcome correspondent to known profiles W0, H0. Default to 1, i.e. sum of W should sum up to 1, which can be interpreted as "distribution" or "proportion"

check.k

If to check whether k ≤ nm/(n+m), where (n,m)=dim(A), or k is smaller than the column-wise and row-wise minimum numbers of complete observation

max.iter

Maximum iteration of alternating NNLS solutions to H and W

rel.tol

Stop criterion, which is relative tolerance between two successive iterations, = |e2-e1|/avg(e1, e2)

n.threads

An integer number of threads/CPUs to use. Default to 1(no parallel). Specify 0 for all cores

trace

An integer indicating how frequent the error should be checked. MSE and MKL error will computed every trace iterations. If 0 or negative, trace is set to a very large number and only errors of the first and last iterations are checked.

verbose

Either 0/FALSE, 1/TRUE or 2, with 0/FALSE/else = no any tracking, 1 == progression bar, 2 == print iteration info.

show.warning

If to show warnings when targeted rel.tol is not reached

inner.max.iter

Maximum number of iterations passed to each inner W or H matrix updating loop

inner.rel.tol

Stop criterion for the inner loop, which is relative tolerance passed to inner W or H matrix updating i.e., |e2-e1|/avg(e1, e2)

Details

The problem of non-negative matrix factorization is to find W, H, W_1, H_1, such that

A = W H + W_0 H_1 + W_1 H_0 + \varepsilon = [W W_0 W_1] [H' H'_1 H'_0]' + \varepsilon


where W_0, H_0 are known matrices, which are NULLs in most application case and \varepsilon is noise.. In tumor content deconvolution, W_0 can be thought as known healthy profile, and W is desired pure cancer profile. One also set H_0 to a row matrix of 1, and thus W_1 can be treated as common profile among samples. Use init to specify W_0 and H_0.

Argument init, if used, must be a list with entries named as 'W', 'H', 'W0', 'W1', 'H1', 'H0'. One could specify only a few of them. Only use 'W0' (and its correspondent 'H1') or 'H0' (and its correspondent 'W1') for known matrices/profiles.

Similarly, argument mask, if used, must be a list entries named as 'W', 'H', 'W0', 'W1', 'H1', 'H0', and they should be either NULL (no specified) or a logical matrix. If a masked for matrix is specified, then masked entries will be fixed to their initial values if initialized (skipped during iteration), or 0 if not initialized.

To simplify the notations, we denote right hand side of the above equation as W H. The problem to solved using square error is

argmin_{W ≥ 0, H ≥ 0} L(A, W H) + J(W, α) + J(H', β)


where L(x, y) is a loss function either a square loss

\frac{1}{2} ||x-y||_2^2

or a Kullback-Leibler divergence

x \log (x/y) - x - y,

and

J(X, α) = α_1 J_1(X) + α_2 J_2(X) + α_3 J_3(X),

J_1(X) = \frac12 ||X||_F^2 = \frac{1}{2} tr(XX^T),

J_2(X) = ∑_{i < j} (X_{\cdot i})^TX_{\cdot j} = \frac12 tr(X(E-I)X^T),

J_3(X) = ∑_{i,j} |x_{ij}| = tr(XE).

The formal one is usually better for symmetric distribution, while the later one is more suitable for skewed distribution, especially for count data as it can be derived from Poisson distributed observation. The penalty function J is a composition of three types of penalties, which aim to minimizing L2 norm, maximizing angles between hidden features (columns of W and rows of H) and L1 norm (sparsity). The parameters α, β of length 3 indicates the amount of penalties.

When method == 'scd', a sequential coordinate-wise descent algorithm is used when solving W and H alternatively, which are non-negative regression problem. The inner.max.iter and inner.rel.tol is used to control the number of iteration for these non-negative regressions. This is also applicable to method == 'lee' (the original algorithm only iteration through all entries once for each iteration), which is usually faster than the original algorithm when loss == 'mse'. When loss == 'mkl', a quadratic approximation to the KL-divergence is used when method == 'scd'. Generally, for run time, 'scd' is faster than 'lee' and 'mse' is faster than 'mkl'.

Value

A list with components

Author(s)

Eric Xihui Lin, xihuil.silence@gmail.com

References

Franc, V. C., Hlavac, V. C., Navara, M. (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. Proc. Int'l Conf. Computer Analysis of Images and Patterns. Lecture Notes in Computer Science 3691. p. 407.

Lee, Daniel D., and H. Sebastian Seung. 1999. "Learning the Parts of Objects by Non-Negative Matrix Factorization." Nature 401: 788-91.

Pascual-Montano, Alberto, J.M. Carazo, Kieko Kochi, Dietrich Lehmann, and Roberto D.Pascual-Marqui. 2006. "Nonsmooth Nonnegative Matrix Factorization (NsNMF)." IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (3): 403-14.

See Also

nnlm, predict.nnmf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Pattern extraction, meta-gene
set.seed(123);

data(nsclc, package = 'NNLM')
str(nsclc)

decomp <- nnmf(nsclc[, 1:80], 3, rel.tol = 1e-5);

heatmap(decomp$W, Colv = NA, xlab = 'Meta-gene', ylab = 'Gene', margins = c(2,2),
	labRow = '', labCol = '', scale = 'column', col = cm.colors(100));
heatmap(decomp$H, Rowv = NA, ylab = 'Meta-gene', xlab = 'Patient', margins = c(2,2),
	labRow = '', labCol = '', scale = 'row', col = cm.colors(100));

# missing value imputation
set.seed(123);
nsclc2 <- nsclc;
index <- sample(length(nsclc2), length(nsclc2)*0.3);
nsclc2[index] <- NA;

# impute using NMF
system.time(nsclc2.nmf <- nnmf(nsclc2, 2));
nsclc2.hat.nmf <- with(nsclc2.nmf, W %*% H);

mse.mkl(nsclc[index], nsclc2.hat.nmf[index])

NNLM documentation built on July 3, 2019, 1:03 a.m.