Description Usage Arguments Details Value Optimized C++ vs. plain R Algorithms Seeding methods Methods (by generic) References See Also Examples
The function nmf
is a S4 generic defines the main interface to run NMF
algorithms within the framework defined in package NMF
.
It has many methods that facilitates applying, developing and testing NMF
algorithms.
The package vignette vignette('NMF')
contains an introduction to the
interface, through a sample data analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | nmf(x, rank, method, ...)
## S4 method for signature 'data.frame,ANY,ANY'
nmf(x, rank, method, ...)
## S4 method for signature 'mMatrix,numeric,'NULL''
nmf(x, rank, method, seed = NULL, model = NULL, ...)
## S4 method for signature 'mMatrix,numeric,list'
nmf(x, rank, method, ..., .parameters = list())
## S4 method for signature 'mMatrix,numeric,character'
nmf(x, rank, method, ...)
## S4 method for signature 'mMatrix,numeric,'function''
nmf(
x,
rank,
method,
seed,
model = "NMFstd",
...,
name,
objective = "euclidean",
mixed = FALSE
)
## S4 method for signature 'mMatrix,NMF,ANY'
nmf(x, rank, method, seed, ...)
## S4 method for signature 'mMatrix,'NULL',ANY'
nmf(x, rank, method, seed, ...)
## S4 method for signature 'mMatrix,missing,ANY'
nmf(x, rank, method, ...)
## S4 method for signature 'mMatrix,numeric,missing'
nmf(x, rank, method, ...)
## S4 method for signature 'mMatrix,matrix,ANY'
nmf(x, rank, method, seed, model = list(), ...)
## S4 method for signature 'mMatrix,data.frame,ANY'
nmf(x, rank, method, ...)
## S4 method for signature 'formula,ANY,ANY'
nmf(x, rank, method, ..., model = NULL)
## S4 method for signature 'mMatrix,numeric,NMFStrategy'
nmf(
x,
rank,
method,
seed = nmf.getOption("default.seed"),
rng = NULL,
nrun = if (length(rank) > 1L) 30 else 1,
model = NULL,
.options = list(),
.pbackend = nmf.getOption("pbackend"),
.callback = NULL,
.tmpdir = getwd(),
...
)
|
x |
target data to fit, i.e. a matrix-like object |
rank |
specification of the factorization rank.
It is usually a single numeric value, but other type of values are possible
(e.g. matrix), for which specific methods are implemented.
See for example methods If |
method |
specification of the NMF algorithm.
The most common way of specifying the algorithm is to pass the access key
(i.e. a character string) of an algorithm stored in the package's dedicated registry,
but methods exists that handle other types of values, such as If Cases where the algorithm is inferred from the call are when an NMF model is passed in arguments |
... |
extra arguments to allow extension of the generic.
Arguments that are not used in the chain of internal calls to |
seed |
specification of the starting point or seeding method, which will compute a starting point, usually using data from the target matrix in order to provide a good guess. The seeding method may be specified in the following way:
|
model |
specification of the type of NMF model to use. It is used to instantiate the object that inherits from class
Argument/slot conflicts:
In the case a parameter of the algorithm has the same name as a model slot,
then If a variable appears in both arguments |
.parameters |
list of method-specific parameters.
Its elements must have names matching a single method listed in |
name |
name associated with the NMF algorithm implemented by the function
|
objective |
specification of the objective function associated with the
algorithm implemented by the function It may be either |
mixed |
a logical that indicates if the algorithm implemented by the function
|
rng |
rng specification for the run(s). This argument should be used to set the the RNG seed, while still specifying the seeding method argument seed. |
nrun |
number of runs to perform.
It specifies the number of runs to perform.
By default only one run is performed, except if When using a random seeding method, multiple runs are generally required to achieve stability and avoid bad local minima. |
.options |
this argument is used to set runtime options. It can be a The string must be composed of characters that correspond to a given option
(see mapping below), and modifiers '+' and '-' that toggle options on and off respectively.
E.g. Modifiers '+' and '-' apply to all option character found after them:
for options that accept integer values, the value may be appended to the
option's character e.g. The following options are available (the characters after “-” are those
to use to encode
|
.pbackend |
specification of the Currently it accepts the following values:
|
.callback |
Used when option The call is wrapped into a tryCatch so that callback errors do not stop the whole computation (see below). The results of the different calls to the callback function are stored in a
miscellaneous slot accessible using the method If no error occurs See the examples for sample code. |
.tmpdir |
path to the directory where a temporary directory is created to
store intermediate results. This is only relevant for multi-runs performed using
a foreach backend (including the sequential backend |
The nmf
function has multiple methods that compose a very flexible
interface allowing to:
combine NMF algorithms with seeding methods and/or stopping/convergence criterion at runtime;
perform multiple NMF runs, which are computed in parallel whenever the host machine allows it;
run multiple algorithms with a common set of parameters, ensuring a consistent environment (notably the RNG settings).
The workhorse method is nmf,matrix,numeric,NMFStrategy
, which is eventually
called by all other methods.
The other methods provides convenient ways of specifying the NMF algorithm(s),
the factorization rank, or the seed to be used.
Some allow to directly run NMF algorithms on different types of objects, such
as data.frame
or ExpressionSet
objects.
The returned value depends on the run mode:
Single run: |
An object of class |
Multiple runs, single method: |
When |
Multiple runs, multiple methods: |
When |
Lee and Seung's multiplicative updates are used by several NMF algorithms. To improve speed and memory usage, a C++ implementation of the specific matrix products is used whenever possible. It directly computes the updates for each entry in the updated matrix, instead of using multiple standard matrix multiplication.
The algorithms that benefit from this optimization are: 'brunet', 'lee', 'nsNMF' and 'offset'. % and 'lnmf' However there still exists plain R versions for these methods, which implement the updates as standard matrix products. These are accessible by adding the prefix '.R#' to their name: '.R#brunet', '.R#lee', '.R#nsNMF' and '.R#offset'.
All algorithms are accessible by their respective access key as listed below. The following algorithms are available:
Standard NMF, based on the Kullback-Leibler divergence, from Brunet et al. (2004). It uses simple multiplicative updates from Lee and Seung (2001), enhanced to avoid numerical underflow.
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Standard NMF based on the Euclidean distance from Lee and Seung (2001). It uses simple multiplicative updates.
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Least-Square NMF from Wang et al. (2006). It uses modified versions of Lee and Seung's multiplicative updates for the Euclidean distance, which incorporates weights on each entry of the target matrix, e.g. to reflect measurement uncertainty.
Default stopping criterion: stationarity of the objective function
(see nmf.stop.stationary
).
Nonsmooth NMF from Pascual-Montano et al. (2006). It uses a modified version of Lee and Seung's multiplicative updates for the Kullback-Leibler divergence Lee and Seung (2001), to fit a extension of the standard NMF model, that includes an intermediate smoothing matrix, meant meant to produce sparser factors.
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
NMF with offset from Badea (2008). It uses a modified version of Lee and Seung's multiplicative updates for Euclidean distance Lee and Seung (2001), to fit an NMF model that includes an intercept, meant to capture a common baseline and shared patterns, in order to produce cleaner basis components.
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Pattern-Expression NMF from Zhang2008. It uses multiplicative updates to minimize an objective function based on the Euclidean distance, that is regularized for effective expression of patterns with basis vectors.
Default stopping criterion: stationarity of the objective function
(see nmf.stop.stationary
).
Alternating Least Square (ALS) approach
from Kim and Park (2007).
It applies the nonnegative least-squares algorithm from Van Benthem and Keenan (2004)
(i.e. fast combinatorial nonnegative least-squares for multiple right-hand),
to estimate the basis and coefficient matrices alternatively
(see fcnnls
).
It minimises an Euclidean-based objective function, that is regularized to
favour sparse basis matrices (for ‘snmf/l’) or sparse coefficient matrices
(for ‘snmf/r’).
Stopping criterion: built-in within the internal workhorse function nmf_snmf
,
based on the KKT optimality conditions.
The purpose of seeding methods is to compute initial values for the factor matrices in a given NMF model. This initial guess will be used as a starting point by the chosen NMF algorithm.
The seeding method to use in combination with the algorithm can be passed
to interface nmf
through argument seed
.
The seeding seeding methods available in registry are listed by the function
nmfSeed
(see list therein).
Detailed examples of how to specify the seeding method and its parameters can be found in the Examples section of this man page and in the package's vignette.
nmf(x = data.frame,rank = ANY,method = ANY)
: Fits an NMF model on a data.frame
.
The target data.frame
is coerced into a matrix with as.matrix
.
nmf(x = mMatrix,rank = numeric,method = NULL)
: Fits an NMF model using an appropriate algorithm when method
is not supplied.
This method tries to select an appropriate algorithm amongst the NMF algorithms
stored in the internal algorithm registry, which contains the type of NMF models
each algorithm can fit.
This is possible when the type of NMF model to fit is available from argument seed
,
i.e. if it is an NMF model itself.
Otherwise the algorithm to use is obtained from nmf.getOption('default.algorithm')
.
This method is provided for internal usage, when called from other nmf
methods
with argument method
missing in the top call (e.g. nmf,matrix,numeric,missing
).
nmf(x = mMatrix,rank = numeric,method = list)
: Fits multiple NMF models on a common matrix using a list of algorithms.
The models are fitted sequentially with nmf
using the same options
and parameters for all algorithms.
In particular, irrespective of the way the computation is seeded, this method
ensures that all fits are performed using the same initial RNG settings.
This method returns an object of class NMFList
, that is
essentially a list containing each fit.
nmf(x = mMatrix,rank = numeric,method = character)
: Fits an NMF model on x
using an algorithm registered with access key
method
.
Argument method
is partially match against the access keys of all
registered algorithms (case insensitive).
Available algorithms are listed in section Algorithms below or the
introduction vignette.
A vector of their names may be retrieved via nmfAlgorithm()
.
nmf(x = mMatrix,rank = numeric,method = function)
: Fits an NMF model on x
using a custom algorithm defined the function
method
.
The supplied function must have signature (x=matrix, start=NMF, ...)
and return an object that inherits from class NMF
.
It will be called internally by the workhorse nmf
method, with an NMF model
to be used as a starting point passed in its argument start
.
Extra arguments in ...
are passed to method
from the top
nmf
call.
Extra arguments that have no default value in the definition of the function
method
are required to run the algorithm (e.g. see argument alpha
of myfun
in the examples).
If the algorithm requires a specific type of NMF model, this can be specified
in argument model
that is handled as in the workhorse nmf
method (see description for this argument).
nmf(x = mMatrix,rank = NMF,method = ANY)
: Fits an NMF model using the NMF model rank
to seed the computation,
i.e. as a starting point.
This method is provided for convenience as a shortcut for
nmf(x, nbasis(object), method, seed=object, ...)
It discards any value passed in argument seed
and uses the NMF model passed
in rank
instead.
It throws a warning if argument seed
not missing.
If method
is missing, this method will call the method
nmf,matrix,numeric,NULL
, which will infer an algorithm suitable for fitting an
NMF model of the class of rank
.
nmf(x = mMatrix,rank = NULL,method = ANY)
: Fits an NMF model using the NMF model supplied in seed
, to seed the computation,
i.e. as a starting point.
This method is provided for completeness and is equivalent to
nmf(x, seed, method, ...)
.
nmf(x = mMatrix,rank = missing,method = ANY)
: Method defined to ensure the correct dispatch to workhorse methods in case
of argument rank
is missing.
nmf(x = mMatrix,rank = numeric,method = missing)
: Method defined to ensure the correct dispatch to workhorse methods in case
of argument method
is missing.
nmf(x = mMatrix,rank = matrix,method = ANY)
: Fits an NMF model partially seeding the computation with a given matrix passed
in rank
.
The matrix rank
is used either as initial value for the basis or mixture
coefficient matrix, depending on its dimension.
Currently, such partial NMF model is directly used as a seed, meaning that the remaining part is left uninitialised, which is not accepted by all NMF algorithm. This should change in the future, where the missing part of the model will be drawn from some random distribution.
Amongst built-in algorithms, only ‘snmf/l’ and ‘snmf/r’ support partial seeds, with only the coefficient or basis matrix initialised respectively.
nmf(x = mMatrix,rank = data.frame,method = ANY)
: Shortcut for nmf(x, as.matrix(rank), method, ...)
.
nmf(x = formula,rank = ANY,method = ANY)
: This method implements the interface for fitting formula-based NMF models.
See nmfModel
.
Argument rank
target matrix or formula environment.
If not missing, model
must be a list
, a data.frame
or
an environment
in which formula variables are searched for.
Brunet J, Tamayo P, Golub TR, Mesirov JP (2004). “Metagenes and molecular pattern discovery using matrix factorization.” _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), 4164-9. ISSN 0027-8424, doi: 10.1073/pnas.0308531101 (URL: https://doi.org/10.1073/pnas.0308531101).
Brunet J, Tamayo P, Golub TR, Mesirov JP (2004). “Metagenes and molecular pattern discovery using matrix factorization.” _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), 4164-9. ISSN 0027-8424, doi: 10.1073/pnas.0308531101 (URL: https://doi.org/10.1073/pnas.0308531101).
Lee DD, Seung H (2001). “Algorithms for non-negative matrix factorization.” _Advances in neural information processing systems_. <URL: http://scholar.google.com/scholar?q=intitle:Algorithms+for+non-negative+matrix+factorization\#0>.
Wang G, Kossenkov AV, Ochs MF (2006). “LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates.” _BMC bioinformatics_, *7*, 175. ISSN 1471-2105, doi: 10.1186/1471-2105-7-175 (URL: https://doi.org/10.1186/1471-2105-7-175).
Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-marqui RD (2006). “Nonsmooth nonnegative matrix factorization (nsNMF).” _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, 403-415.
Badea L (2008). “Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization.” _Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing_, *290*, 267-78. ISSN 1793-5091, <URL: http://www.ncbi.nlm.nih.gov/pubmed/18229692>.
Kim H, Park H (2007). “Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.” _Bioinformatics (Oxford, England)_, *23*(12), 1495-502. ISSN 1460-2059, doi: 10.1093/bioinformatics/btm134 (URL: https://doi.org/10.1093/bioinformatics/btm134).
Van Benthem MH, Keenan MR (2004). “Fast algorithm for the solution of large-scale non-negativity-constrained least squares problems.” _Journal of Chemometrics_, *18*(10), 441-450. ISSN 0886-9383, doi: 10.1002/cem.889 (URL: https://doi.org/10.1002/cem.889).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Only basic calls are presented in this manpage.
# Many more examples are provided in the demo file nmf.R
## Not run:
demo('nmf')
## End(Not run)
# random data
x <- rmatrix(20,10)
# run default algorithm with rank 2
res <- nmf(x, 2)
# specify the algorithm
res <- nmf(x, 2, 'lee')
# get verbose message on what is going on
res <- nmf(x, 2, .options='v')
## Not run:
# more messages
res <- nmf(x, 2, .options='v2')
# even more
res <- nmf(x, 2, .options='v3')
# and so on ...
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.