onlineINMF | R Documentation |
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019, C. Gao, 2021) using online learning approach to return factorized
H
, W
, and V
matrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+
\lambda\sum_{i}^{d}||V_iH_i||_F^2
where E_i
is the input non-negative matrix of the i
'th dataset,
d
is the total number of datasets. E_i
is of size
m \times n_i
for m
features and n_i
sample points,
H_i
is of size k \times n_i
, V_i
is of size
m \times k
, and W
is of size m \times k
.
Different from inmf
which optimizes the objective with ANLS
approach, onlineINMF
optimizes the same objective with online learning
strategy, where it updates mini-batches of H_i
solving the NNLS
problem, and updates V_i
and W
with HALS multiplicative method.
This function allows online learning in 3 scenarios:
Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization
onlineINMF(
objectList,
newDatasets = NULL,
project = FALSE,
k = 20,
lambda = 5,
maxEpoch = 5,
minibatchSize = 5000,
maxHALSIter = 1,
permuteChunkSize = 1000,
nCores = 2,
Hinit = NULL,
Vinit = NULL,
Winit = NULL,
Ainit = NULL,
Binit = NULL,
verbose = FALSE
)
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
newDatasets |
Same requirements as for new arriving datasets. Default
|
project |
Logical scalar, whether to run scenario 3. See description.
Default |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
maxEpoch |
The number of epochs to iterate through. Default |
minibatchSize |
Total number of cells in each mini-batch. Default
|
maxHALSIter |
Maximum number of block coordinate descent (HALS
algorithm) iterations to perform for each update of |
permuteChunkSize |
Number of cells in a chunk being shuffled before
subsetting to minibatches. Only appliable to in-memory data and for Scenario
1 and 2. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
Hinit , Vinit , Winit , Ainit , Binit |
Pass the previous factorization result
for datasets existing in |
verbose |
Logical scalar. Whether to show information and progress.
Default |
A list of the following elements:
H
- a list of result H_i
matrices of size
n_i \times k
V
- a list of result V_i
matrices
W
- the result W
matrix
A
- a list of result A_i
matrices, k \times k
B
- a list of result B_i
matrices, m \times k
objErr
- the final objective error value.
Yichen Wang
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021
library(Matrix)
# Scenario 1 with sparse matrices
set.seed(1)
res1 <- onlineINMF(list(ctrl.sparse, stim.sparse),
minibatchSize = 50, k = 10, verbose = FALSE)
# Scenario 2 with H5 dense matrices
h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5",
package = "RcppPlanc", mustWork = TRUE),
dataPath = "scaleData")
h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5",
package = "RcppPlanc", mustWork = TRUE),
dataPath = "scaleData")
res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10, verbose = FALSE)
res3 <- onlineINMF(list(ctrl = h5dense1),
newDatasets = list(stim = h5dense2),
Hinit = res2$H, Vinit = res2$V, Winit = res2$W,
Ainit = res2$A, Binit = res2$B,
minibatchSize = 50, k = 10, verbose = FALSE)
# Scenario 3 with H5 sparse matrices
h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5",
package = "RcppPlanc", mustWork = TRUE),
valuePath = "scaleDataSparse/data",
rowindPath = "scaleDataSparse/indices",
colptrPath = "scaleDataSparse/indptr",
nrow = nrow(ctrl.sparse),
ncol = ncol(ctrl.sparse))
h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5",
package = "RcppPlanc", mustWork = TRUE),
valuePath = "scaleDataSparse/data",
rowindPath = "scaleDataSparse/indices",
colptrPath = "scaleDataSparse/indptr",
nrow = nrow(stim.sparse),
ncol = nrow(stim.sparse))
res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10, verbose = FALSE)
res5 <- onlineINMF(list(ctrl = h5sparse1),
newDatasets = list(stim = h5sparse2), project = TRUE,
Hinit = res4$H, Vinit = res4$V, Winit = res4$W,
Ainit = res4$A, Binit = res4$B,
minibatchSize = 50, k = 10, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.