nmfEstimateRank  R Documentation 
A critical parameter in NMF algorithms is the
factorization rank r. It defines the number of
basis effects used to approximate the target matrix.
Function nmfEstimateRank
helps in choosing an
optimal rank by implementing simple approaches proposed
in the literature.
Note that from version 0.7, one can equivalently
call the function nmf
with a range of
ranks.
In the plot generated by plot.NMF.rank
, each curve
represents a summary measure over the range of ranks in
the survey. The colours correspond to the type of data to
which the measure is related: coefficient matrix, basis
component matrix, best fit, or consensus matrix.
nmfEstimateRank(x, range, method = nmf.getOption("default.algorithm"), nrun = 30, model = NULL, ..., verbose = FALSE, stop = FALSE) ## S3 method for class 'NMF.rank' plot(x, y = NULL, what = c("all", "cophenetic", "rss", "residuals", "dispersion", "evar", "sparseness", "sparseness.basis", "sparseness.coef", "silhouette", "silhouette.coef", "silhouette.basis", "silhouette.consensus"), na.rm = FALSE, xname = "x", yname = "y", xlab = "Factorization rank", ylab = "", main = "NMF rank survey", ...)
x 
For For 
range 
a 
method 
A single NMF algorithm, in one of the
format accepted by the function 
nrun 
a 
model 
model specification passed to each

verbose 
toggle verbosity. This parameter only
affects the verbosity of the outer loop over the values
in 
stop 
logical flag for running the estimation
process with fault tolerance. When 
... 
For For 
y 
reference object of class 
what 
a 
na.rm 
single logical that specifies if the rank
for which the measures are NA values should be removed
from the graph or not (default to 
xname,yname 
legend labels for the curves
corresponding to measures from 
xlab 
xaxis label 
ylab 
yaxis label 
main 
main title 
Given a NMF algorithm and the target matrix, a common way of estimating r is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).
The function nmfEstimateRank
allows to perform
this estimation procedure. It performs multiple NMF runs
for a range of rank of factorization and, for each,
returns a set of quality measures together with the
associated consensus matrix.
In order to avoid overfitting, it is recommended to run
the same procedure on randomized data. The results on the
original and the randomised data may be plotted on the
same plots, using argument y
.
nmfEstimateRank
returns a S3 object (i.e. a list)
of class NMF.rank
with the following elements:
measures 
a 
consensus 
a 
fit 
a 
Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 41649. ISSN 00278424, <URL: http://dx.doi.org/10.1073/pnas.0308531101>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/15016911>.
Hutchins LN, Murphy SM, Singh P and Graber JH (2008). "Positiondependent motif characterization using nonnegative matrix factorization." _Bioinformatics (Oxford, England)_, *24*(23), pp. 268490. ISSN 13674811, <URL: http://dx.doi.org/10.1093/bioinformatics/btn526>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/18852176>.
if( !isCHECK() ){ set.seed(123456) n < 50; r < 3; m < 20 V < syntheticNMF(n, r, m) # Use a seed that will be set before each first run res < nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456) # or equivalently res < nmf(V, seq(2,5), method='brunet', nrun=10, seed=123456) # plot all the measures plot(res) # or only one: e.g. the cophenetic correlation coefficient plot(res, 'cophenetic') # run same estimation on randomized data rV < randomize(V) rand < nmfEstimateRank(rV, seq(2,5), method='brunet', nrun=10, seed=123456) plot(res, rand) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.