online_iNMF: Perform online iNMF on scaled datasets

View source: R/rliger.R

online_iNMFR Documentation

Perform online iNMF on scaled datasets

Description

Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.

For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.

Usage

online_iNMF(
  object,
  X_new = NULL,
  projection = FALSE,
  W.init = NULL,
  V.init = NULL,
  H.init = NULL,
  A.init = NULL,
  B.init = NULL,
  k = 20,
  lambda = 5,
  max.epochs = 5,
  miniBatch_max_iters = 1,
  miniBatch_size = 5000,
  h5_chunk_size = 1000,
  seed = 123,
  verbose = TRUE
)

Arguments

object

liger object with data stored in HDF5 files. Should normalize, select genes, and scale before calling.

X_new

List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file.

projection

Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE)

W.init

Optional initialization for W. (default NULL)

V.init

Optional initialization for V (default NULL)

H.init

Optional initialization for H (default NULL)

A.init

Optional initialization for A (default NULL)

B.init

Optional initialization for B (default NULL)

k

Inner dimension of factorization–number of metagenes (default 20). A value in the range 20-50 works well for most analyses.

lambda

Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0).

max.epochs

Maximum number of epochs (complete passes through the data). (default 5)

miniBatch_max_iters

Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended.

miniBatch_size

Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset.

h5_chunk_size

Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size.

seed

Random seed to allow reproducible results (default 123).

verbose

Print progress bar/messages (TRUE by default)

Value

liger object with H, W, V, A and B slots set.

Examples

ligerex <- createLiger(list(ctrl = ctrl, stim = stim))
if (length(ligerex@h5file.info) > 0) {
    # This function only works for HDF5 based liger object
    ligerex <- normalize(ligerex)
    ligerex <- selectGenes(ligerex)
    ligerex <- scaleNotCenter(ligerex)
    # `miniBatch_size` has to be no larger than the number of cells in the smallest dataset
    ligerex <- online_iNMF(ligerex, miniBatch_size = 100)
}

rliger documentation built on Nov. 9, 2023, 1:07 a.m.