onlineINMF: Perform Integrative Non-negative Matrix Factorization Using...

View source: R/INMF.R

onlineINMFR Documentation

Perform Integrative Non-negative Matrix Factorization Using Online Learning

Description

Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch, 2019, C. Gao, 2021) using online learning approach to return factorized H, W, and V matrices. The objective function is stated as

\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+ \lambda\sum_{i}^{d}||V_iH_i||_F^2

where E_i is the input non-negative matrix of the i'th dataset, d is the total number of datasets. E_i is of size m \times n_i for m features and n_i sample points, H_i is of size k \times n_i, V_i is of size m \times k, and W is of size m \times k.

Different from inmf which optimizes the objective with ANLS approach, onlineINMF optimizes the same objective with online learning strategy, where it updates mini-batches of H_i solving the NNLS problem, and updates V_i and W with HALS multiplicative method.

This function allows online learning in 3 scenarios:

  1. Fully observed datasets;

  2. Iterative refinement using continually arriving datasets;

  3. Projection of new datasets without updating the existing factorization

Usage

onlineINMF(
  objectList,
  newDatasets = NULL,
  project = FALSE,
  k = 20,
  lambda = 5,
  maxEpoch = 5,
  minibatchSize = 5000,
  maxHALSIter = 1,
  permuteChunkSize = 1000,
  nCores = 2,
  Hinit = NULL,
  Vinit = NULL,
  Winit = NULL,
  Ainit = NULL,
  Binit = NULL,
  verbose = FALSE
)

Arguments

objectList

list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat.

newDatasets

Same requirements as for new arriving datasets. Default NULL for scenario 1, specify for scenario 2 or 3.

project

Logical scalar, whether to run scenario 3. See description. Default FALSE.

k

Integer. Inner dimensionality to factorize the datasets into. Default 20.

lambda

Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as lambda increases). Default 5.

maxEpoch

The number of epochs to iterate through. Default 5.

minibatchSize

Total number of cells in each mini-batch. Default 5000.

maxHALSIter

Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V. Default 1. Changing this parameter is not recommended.

permuteChunkSize

Number of cells in a chunk being shuffled before subsetting to minibatches. Only appliable to in-memory data and for Scenario 1 and 2. Default 1000.

nCores

The number of parallel tasks that will be spawned. Default 2

Hinit, Vinit, Winit, Ainit, Binit

Pass the previous factorization result for datasets existing in objectList, in order to run scenario 2 or 3. All should have length(objectList) matrices inside. See description for dimensionality of H_i, V_i and W_i. A_i should be of size k \times k and B_i should be of size m \times k

verbose

Logical scalar. Whether to show information and progress. Default FALSE.

Value

A list of the following elements:

  • H - a list of result H_i matrices of size n_i \times k

  • V - a list of result V_i matrices

  • W - the result W matrix

  • A - a list of result A_i matrices, k \times k

  • B - a list of result B_i matrices, m \times k

  • objErr - the final objective error value.

Author(s)

Yichen Wang

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019

Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021

Examples

library(Matrix)

# Scenario 1 with sparse matrices
set.seed(1)
res1 <- onlineINMF(list(ctrl.sparse, stim.sparse),
                   minibatchSize = 50, k = 10, verbose = FALSE)

# Scenario 2 with H5 dense matrices
h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10, verbose = FALSE)
res3 <- onlineINMF(list(ctrl = h5dense1),
                   newDatasets = list(stim = h5dense2),
                   Hinit = res2$H, Vinit = res2$V, Winit = res2$W,
                   Ainit = res2$A, Binit = res2$B,
                   minibatchSize = 50, k = 10, verbose = FALSE)

# Scenario 3 with H5 sparse matrices
h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(ctrl.sparse),
                                ncol = ncol(ctrl.sparse))
h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(stim.sparse),
                                ncol = nrow(stim.sparse))
res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10, verbose = FALSE)
res5 <- onlineINMF(list(ctrl = h5sparse1),
                   newDatasets = list(stim = h5sparse2), project = TRUE,
                   Hinit = res4$H, Vinit = res4$V, Winit = res4$W,
                   Ainit = res4$A, Binit = res4$B,
                   minibatchSize = 50, k = 10, verbose = FALSE)


RcppPlanc documentation built on April 15, 2025, 1:11 a.m.