runINMF | R Documentation |
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019) using block coordinate descent (alternating non-negative
least squares, ANLS) to return factorized H
, W
, and V
matrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+
\lambda\sum_{i}^{d}||V_iH_i||_F^2
where E_i
is the input non-negative matrix of the i'th dataset, d
is the total number of datasets. E_i
is of size m \times n_i
for
m
variable genes and n_i
cells, H_i
is of size
n_i \times k
, V_i
is of size m \times k
, and W
is of
size m \times k
.
The factorization produces a shared W
matrix (genes by k), and for each
dataset, an H
matrix (k by cells) and a V
matrix (genes by k).
The H
matrices represent the cell factor loadings. W
is held
consistent among all datasets, as it represents the shared components of the
metagenes across datasets. The V
matrices represent the
dataset-specific components of the metagenes.
This function adopts highly optimized fast and memory efficient
implementation extended from Planc (Kannan, 2016). Pre-installation of
extension package RcppPlanc
is required. The underlying algorithm
adopts the identical ANLS strategy as optimizeALS
in the old
version of LIGER.
runINMF(object, k = 20, lambda = 5, ...)
## S3 method for class 'liger'
runINMF(
object,
k = 20,
lambda = 5,
nIteration = 30,
nRandomStarts = 1,
HInit = NULL,
WInit = NULL,
VInit = NULL,
seed = 1,
nCores = 2L,
verbose = getOption("ligerVerbose", TRUE),
...
)
## S3 method for class 'Seurat'
runINMF(
object,
k = 20,
lambda = 5,
datasetVar = "orig.ident",
layer = "ligerScaleData",
assay = NULL,
reduction = "inmf",
nIteration = 30,
nRandomStarts = 1,
HInit = NULL,
WInit = NULL,
VInit = NULL,
seed = 1,
nCores = 2L,
verbose = getOption("ligerVerbose", TRUE),
...
)
object |
A liger object or a Seurat object with
non-negative scaled data of variable features (Done with
|
k |
Inner dimension of factorization (number of factors). Generally, a
higher |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
... |
Arguments passed to methods. |
nIteration |
Total number of block coordinate descent iterations to
perform. Default |
nRandomStarts |
Number of restarts to perform (iNMF objective function
is non-convex, so taking the best objective from multiple successive
initialization is recommended). For easier reproducibility, this increments
the random seed by 1 for each consecutive restart, so future factorization
of the same dataset can be run with one rep if necessary. Default |
HInit |
Initial values to use for |
WInit |
Initial values to use for |
VInit |
Initial values to use for |
seed |
Random seed to allow reproducible results. Default |
nCores |
The number of parallel tasks to speed up the computation.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
layer |
For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default |
assay |
Name of assay to use. Default |
reduction |
Name of the reduction to store result. Also used as the
feature key. Default |
liger method - Returns updated input liger object
A list of all H
matrices can be accessed with
getMatrix(object, "H")
A list of all V
matrices can be accessed with
getMatrix(object, "V")
The W
matrix can be accessed with
getMatrix(object, "W")
Seurat method - Returns updated input Seurat object
H
matrices for all datasets will be concatenated and
transposed (all cells by k), and form a DimReduc object in the
reductions
slot named by argument reduction
.
W
matrix will be presented as feature.loadings
in the
same DimReduc object.
V
matrices, an objective error value and the dataset
variable used for the factorization is currently stored in
misc
slot of the same DimReduc object.
In the old version implementation, we compute the objective error at the end
of each iteration, and then compares if the algorithm is reaching a
convergence, using an argument thresh
. Now, since the computation of
objective error is indeed expensive, we canceled this feature and directly
runs a default of 30 (nIteration
) iterations, which empirically leads
to a convergence most of the time. Given that the new version is highly
optimized, running this many iteration should be acceptable.
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc)
pbmc <- scaleNotCenter(pbmc)
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
pbmc <- runINMF(pbmc)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.