nnmf: Non-negative matrix factorization
In NNLM: Fast and Versatile Non-Negative Matrix Factorization

Description Usage Arguments Details Value Author(s) References See Also Examples

Non-negative matrix factorization(NMF or NNMF) using sequential coordinate-wise descent or multiplicative updates

nnmf(A, k = 1L, alpha = rep(0, 3), beta = rep(0, 3),
  method = c("scd", "lee"), loss = c("mse", "mkl"), init = NULL,
  mask = NULL, W.norm = -1L, check.k = TRUE, max.iter = 500L,
  rel.tol = 1e-04, n.threads = 1L, trace = 100/inner.max.iter,
  verbose = 1L, show.warning = TRUE, inner.max.iter = ifelse("mse" ==
  loss, 50L, 1L), inner.rel.tol = 1e-09)

`A`	A matrix to be decomposed
`k`	An integer of decomposition rank
`alpha`	[L2, angle, L1] regularization on W (non-masked entries)
`beta`	[L2, angle, L1] regularization on H (non-masked entries)
`method`	Decomposition algorithms, either 'scd' for sequential coordinate-wise descent(default) or 'lee' for Lee's multiplicative algorithm
`loss`	Loss function to use, either 'mse' for mean square error (default) or 'mkl' for mean KL-divergence
`init`	A list of initial matrices for W and H. One can also supply known matrices W0, H0, and initialize their correspondent matrices H1 and W1. See details
`mask`	A list of mask matrices for W, H, H1 (if `init$W0` supplied), W1 (if `init$H0` supplied), which should have the same shapes as W, H, H1 and W1 if specified. If initial matrices not specified, masked entries are fixed to 0. See details
`W.norm`	A numeric value 'p' indicating the L_p-norm (can be infinity) used to normalized the outcome `W` matrix. No normalization will be performed if 0 or negative. This argument has no effect on outcome correspondent to known profiles `W0, H0`. Default to 1, i.e. sum of W should sum up to 1, which can be interpreted as "distribution" or "proportion"
`check.k`	If to check whether k ≤ nm/(n+m), where (n,m)=dim(A), or k is smaller than the column-wise and row-wise minimum numbers of complete observation
`max.iter`	Maximum iteration of alternating NNLS solutions to H and W
`rel.tol`	Stop criterion, which is relative tolerance between two successive iterations, = \|e2-e1\|/avg(e1, e2)
`n.threads`	An integer number of threads/CPUs to use. Default to 1(no parallel). Specify 0 for all cores
`trace`	An integer indicating how frequent the error should be checked. MSE and MKL error will computed every `trace` iterations. If 0 or negative, trace is set to a very large number and only errors of the first and last iterations are checked.
`verbose`	Either 0/FALSE, 1/TRUE or 2, with 0/FALSE/else = no any tracking, 1 == progression bar, 2 == print iteration info.
`show.warning`	If to show warnings when targeted `rel.tol` is not reached
`inner.max.iter`	Maximum number of iterations passed to each inner W or H matrix updating loop
`inner.rel.tol`	Stop criterion for the inner loop, which is relative tolerance passed to inner W or H matrix updating i.e., \|e2-e1\|/avg(e1, e2)

The problem of non-negative matrix factorization is to find W, H, W_1, H_1, such that

A = W H + W_0 H_1 + W_1 H_0 + \varepsilon = [W W_0 W_1] [H' H'_1 H'_0]' + \varepsilon

where W_0, H_0 are known matrices, which are NULLs in most application case and \varepsilon is noise.. In tumor content deconvolution, W_0 can be thought as known healthy profile, and W is desired pure cancer profile. One also set H_0 to a row matrix of 1, and thus W_1 can be treated as common profile among samples. Use init to specify W_0 and H_0.

Argument init, if used, must be a list with entries named as 'W', 'H', 'W0', 'W1', 'H1', 'H0'. One could specify only a few of them. Only use 'W0' (and its correspondent 'H1') or 'H0' (and its correspondent 'W1') for known matrices/profiles.

Similarly, argument mask, if used, must be a list entries named as 'W', 'H', 'W0', 'W1', 'H1', 'H0', and they should be either NULL (no specified) or a logical matrix. If a masked for matrix is specified, then masked entries will be fixed to their initial values if initialized (skipped during iteration), or 0 if not initialized.

To simplify the notations, we denote right hand side of the above equation as W H. The problem to solved using square error is

argmin_{W ≥ 0, H ≥ 0} L(A, W H) + J(W, α) + J(H', β)

where L(x, y) is a loss function either a square loss

\frac{1}{2} ||x-y||_2^2

or a Kullback-Leibler divergence

x \log (x/y) - x - y,

and

J(X, α) = α_1 J_1(X) + α_2 J_2(X) + α_3 J_3(X),

J_1(X) = \frac12 ||X||_F^2 = \frac{1}{2} tr(XX^T),

J_2(X) = ∑_{i < j} (X_{\cdot i})^TX_{\cdot j} = \frac12 tr(X(E-I)X^T),

J_3(X) = ∑_{i,j} |x_{ij}| = tr(XE).

The formal one is usually better for symmetric distribution, while the later one is more suitable for skewed distribution, especially for count data as it can be derived from Poisson distributed observation. The penalty function J is a composition of three types of penalties, which aim to minimizing L2 norm, maximizing angles between hidden features (columns of W and rows of H) and L1 norm (sparsity). The parameters α, β of length 3 indicates the amount of penalties.

When method == 'scd', a sequential coordinate-wise descent algorithm is used when solving W and H alternatively, which are non-negative regression problem. The inner.max.iter and inner.rel.tol is used to control the number of iteration for these non-negative regressions. This is also applicable to method == 'lee' (the original algorithm only iteration through all entries once for each iteration), which is usually faster than the original algorithm when loss == 'mse'. When loss == 'mkl', a quadratic approximation to the KL-divergence is used when method == 'scd'. Generally, for run time, 'scd' is faster than 'lee' and 'mse' is faster than 'mkl'.

A list with components

W : left matrix, including known W0 and W1 if available, i.e., column stacked as [W, W0, W1]
H : right matrix, including H1 and known H0 if available, i.e. row stacked as [H', H1', H0']'
mse : a vector of mean squared errors through iterations
mkl : a vector of mean KL-divergence through iterations
target.loss : target for minimization, which is mean KL-divergence (if loss == 'mkl') or half of mean squared error if loss == 'mse' plus penalties
average.epochs : a vector of average epochs (one complete swap over W and H)
n.iteration : total number of iteration (sum over all column of beta)
run.time : running time
options : list of information of input arguments
call : function call

Eric Xihui Lin, xihuil.silence@gmail.com

Franc, V. C., Hlavac, V. C., Navara, M. (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. Proc. Int'l Conf. Computer Analysis of Images and Patterns. Lecture Notes in Computer Science 3691. p. 407.

Lee, Daniel D., and H. Sebastian Seung. 1999. "Learning the Parts of Objects by Non-Negative Matrix Factorization." Nature 401: 788-91.

Pascual-Montano, Alberto, J.M. Carazo, Kieko Kochi, Dietrich Lehmann, and Roberto D.Pascual-Marqui. 2006. "Nonsmooth Nonnegative Matrix Factorization (NsNMF)." IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (3): 403-14.

nnlm, predict.nnmf

# Pattern extraction, meta-gene
set.seed(123);

data(nsclc, package = 'NNLM')
str(nsclc)

decomp <- nnmf(nsclc[, 1:80], 3, rel.tol = 1e-5);

heatmap(decomp$W, Colv = NA, xlab = 'Meta-gene', ylab = 'Gene', margins = c(2,2),
	labRow = '', labCol = '', scale = 'column', col = cm.colors(100));
heatmap(decomp$H, Rowv = NA, ylab = 'Meta-gene', xlab = 'Patient', margins = c(2,2),
	labRow = '', labCol = '', scale = 'row', col = cm.colors(100));

# missing value imputation
set.seed(123);
nsclc2 <- nsclc;
index <- sample(length(nsclc2), length(nsclc2)*0.3);
nsclc2[index] <- NA;

# impute using NMF
system.time(nsclc2.nmf <- nnmf(nsclc2, 2));
nsclc2.hat.nmf <- with(nsclc2.nmf, W %*% H);

mse.mkl(nsclc[index], nsclc2.hat.nmf[index])