optimizeNewData: Perform factorization for new data

View source: R/optimizeNewParam.R

optimizeNewDataR Documentation

Perform factorization for new data

Description

Uses an efficient strategy for updating that takes advantage of the information in the existing factorization. Assumes that variable features are presented in the new datasets. Two modes are supported (controlled by merge):

  • Append new data to existing datasets specified by useDatasets. Here the existing V matrices for the target datasets will directly be used as initialization, and new H matrices for the merged matrices will be initialized accordingly.

  • Set new data as new datasets. Initial V matrices for them will be copied from datasets specified by useDatasets, and new H matrices will be initialized accordingly.

Usage

optimizeNewData(
  object,
  dataNew,
  useDatasets,
  merge = TRUE,
  lambda = NULL,
  nIteration = 30,
  seed = 1,
  verbose = getOption("ligerVerbose"),
  new.data = dataNew,
  which.datasets = useDatasets,
  add.to.existing = merge,
  max.iters = nIteration,
  thresh = NULL
)

Arguments

object

A liger object. Should have integrative factorization performed e.g. (runINMF) in advance.

dataNew

Named list of raw count matrices, genes by cells.

useDatasets

Selection of datasets to append new data to if merge = TRUE, or the datasets to inherit V matrices from and initialize the optimization when merge = FALSE. Should match the length and order of dataNew.

merge

Logical, whether to add the new data to existing datasets or treat as totally new datasets (i.e. calculate new V matrices). Default TRUE.

lambda

Numeric regularization parameter. By default NULL, this will use the lambda value used in the latest factorization.

nIteration

Number of block coordinate descent iterations to perform. Default 30.

seed

Random seed to allow reproducible results. Default 1. Used by runINMF factorization.

verbose

Logical. Whether to show information of the progress. Default getOption("ligerVerbose") which is TRUE if users have not set.

new.data, which.datasets, add.to.existing, max.iters

These arguments are now replaced by others and will be removed in the future. Please see usage for replacement.

thresh

Deprecated. New implementation of iNMF does not require a threshold for convergence detection. Setting a large enough nIteration will bring it to convergence.

Value

object with W slot updated with the new W matrix, and the H and V slots of each ligerDataset object in the datasets slot updated with the new dataset specific H and V matrix, respectively.

See Also

runINMF, optimizeNewK, optimizeNewLambda

Examples

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc)
pbmc <- scaleNotCenter(pbmc)
# Only running a few iterations for fast examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)
    # Create fake new data by increasing all non-zero count in "ctrl" by 1,
    # and make unique cell identifiers
    ctrl2 <- rawData(dataset(pbmc, "ctrl"))
    ctrl2@x <- ctrl2@x + 1
    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)
    pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2),
                               useDatasets = "ctrl", nIteration = 2)
}

rliger documentation built on Oct. 30, 2024, 1:07 a.m.