online_iNMF-deprecated: [Deprecated] Perform online iNMF on scaled datasets
In rliger: Linked Inference of Genomic Experimental Relationships

online_iNMF-deprecated

R Documentation

Perform online iNMF on scaled datasets

Description

Please turn to runOnlineINMF or runIntegration.

Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.

For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.

Arguments

`object`	`liger` object with data stored in HDF5 files. Should normalize, select genes, and scale before calling.
`X_new`	List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file.
`projection`	Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE)
`W.init`	Optional initialization for W. (default NULL)
`V.init`	Optional initialization for V (default NULL)
`H.init`	Optional initialization for H (default NULL)
`A.init`	Optional initialization for A (default NULL)
`B.init`	Optional initialization for B (default NULL)
`k`	Inner dimension of factorization–number of metagenes (default 20). A value in the range 20-50 works well for most analyses.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0).
`max.epochs`	Maximum number of epochs (complete passes through the data). (default 5)
`miniBatch_max_iters`	Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended.
`miniBatch_size`	Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset.
`h5_chunk_size`	Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size.
`seed`	Random seed to allow reproducible results (default 123).
`verbose`	Print progress bar/messages (TRUE by default)