knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Non-negative matrix factorization (NMF) is NP-hard (Vavasis, 2007). As such, the best that NMF can do, in practice, is find the best discoverable local minima from some set of initializations.
Non-negative Double SVD (NNDSVD) has previously been proposed as a "head-start" for NMF (Boutsidis, 2008), but never does better than random initializations and often does worse (see my blog post on this).
Simply, set the seed when calling the function.
library(RcppML) A <- r_sparsematrix(1000, 1000, inv_density = 16) not_reproducible_model <- nmf(A, k = 10, seed = NULL) # default reproducible_model <- nmf(A, k = 10, seed = 123)
Random uniform initializations are the best initializations because they are not local minima and do not assume prior distribution information. Using non-random initializations (like NNDSVD, first proposed by Boutsidis) can trap models in dangerous local minima and mandate a model inspired by orthogonality rather than colinearity.
Let's see how multiple restarts can improve a model. In RcppML, we simply specify multiple seeds to run multiple restarts:
```{R, message = FALSE, warning = FALSE} data(hawaiibirds) m1 <- nmf(hawaiibirds$counts, k = 10, seed = 1) m2 <- nmf(hawaiibirds$counts, k = 10, seed = 1:10)
```{R} evaluate(m1, hawaiibirds$counts)
evaluate(m2, hawaiibirds$counts)
Multiple random restarts help discover better models.
What happens when only a single seed is specified? RcppML uses runif
to generate a uniform distribution in a randomly selected range. rnorm
is not used just to be safe, because sometimes rnorm
can actually give worse results.
What happens when multiple seeds are specified? RcppML generates multiple initializations, for each initialization randomly choosing to use runif
or rnorm
, and then randomly selecting a uniform range or mean and standard deviation, respectively.
Suppose we learn an NMF model and want to come back later and fit it further. Alternatively, we might learn a model from one set of samples, and simply want to fit it to another set of samples.
Here we learn an NMF model on half of all survey grids in the hawaiibirds
dataset, and then fit it further on the other half:
A <- hawaiibirds$counts grids1 <- sample(1:ncol(A), floor(ncol(A)/2)) grids2 <- (1:ncol(A))[-grids1] model1 <- nmf(A[, grids1], k = 15, seed = 123) model2 <- nmf(A[, grids2], k = 15, seed = model1@w) cat("model 1 iterations: ", model1@misc$iter, ", model 2 iterations: ", model2@misc$iter)
Here use used model1 as a "warm start" for training model2, but this was not particularly helpful. Suppose we trained model2 from a "cold start", would we have done better (as measured by MSE)?
model2_new <- nmf(A[, grids2], k = 15, seed = 123) cat("model 2 warm-start", evaluate(model2_new, A[, grids2]), ", model 2 random start: ", evaluate(model2, A[, grids2]))
We only do slightly better.
In conclusion, it is almost always better to start from a random initialization. Using prior information is rarely helpful and does not appreciably speed up the fitting process.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.