online_iNMF: Perform online iNMF on scaled datasets
In rliger: Linked Inference of Genomic Experimental Relationships

online_iNMF

R Documentation

Perform online iNMF on scaled datasets

Description

Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.

For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.

Usage

online_iNMF(
  object,
  X_new = NULL,
  projection = FALSE,
  W.init = NULL,
  V.init = NULL,
  H.init = NULL,
  A.init = NULL,
  B.init = NULL,
  k = 20,
  lambda = 5,
  max.epochs = 5,
  miniBatch_max_iters = 1,
  miniBatch_size = 5000,
  h5_chunk_size = 1000,
  seed = 123,
  verbose = TRUE
)

Arguments

`object`	`liger` object with data stored in HDF5 files. Should normalize, select genes, and scale before calling.
`X_new`	List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file.
`projection`	Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE)
`W.init`	Optional initialization for W. (default NULL)
`V.init`	Optional initialization for V (default NULL)
`H.init`	Optional initialization for H (default NULL)
`A.init`	Optional initialization for A (default NULL)
`B.init`	Optional initialization for B (default NULL)
`k`	Inner dimension of factorization–number of metagenes (default 20). A value in the range 20-50 works well for most analyses.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0).
`max.epochs`	Maximum number of epochs (complete passes through the data). (default 5)
`miniBatch_max_iters`	Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended.
`miniBatch_size`	Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset.
`h5_chunk_size`	Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size.
`seed`	Random seed to allow reproducible results (default 123).
`verbose`	Print progress bar/messages (TRUE by default)

Value

liger object with H, W, V, A and B slots set.

Examples

ligerex <- createLiger(list(ctrl = ctrl, stim = stim))
if (length(ligerex@h5file.info) > 0) {
    # This function only works for HDF5 based liger object
    ligerex <- normalize(ligerex)
    ligerex <- selectGenes(ligerex)
    ligerex <- scaleNotCenter(ligerex)
    # `miniBatch_size` has to be no larger than the number of cells in the smallest dataset
    ligerex <- online_iNMF(ligerex, miniBatch_size = 100)
}

rliger documentation built on Nov. 9, 2023, 1:07 a.m.