CFITIntegrate_sketched: Integration of multiple data source by fast cFIT with...
In pengminshi/cFIT: cFIT (common Factor Space Integration & Transfer)

View source: R/data_integration_sketched.R

Solve the model parameters through Iterative Nonnegative Matrix Factorization (iNMF), by minimizing the sketched objective function

1/\tilde{N} ∑_j||SX_j -(SH_JW^TΛ_j + S1_nj b_j^T)||_F^2 + gamma ∑_{l=1}^p(∑_{j=1}^m\tilde{n}_j/\tilde{N} λ_{jl}-1)^2

, with additional penalty for SPP.

CFITIntegrate_sketched(
  X.list,
  r = 15,
  max.niter = 100,
  nrep = 1,
  init = NULL,
  subsample.prop = NULL,
  weight.list = NULL,
  tol = 1e-06,
  early.stopping = 50,
  time.out = 60 * 2,
  future.plan = c("sequential", "transparent", "multicore", "multisession", "cluster"),
  workers = parallel::detectCores() - 1,
  verbose = T,
  seed = 0
)

`X.list`	a list of m ncells-by-ngenes, gene expression matrices from m data sets
`r`	scalar, dimension of common factor matrix, which can be chosen as the rough number of identifiable cells types in the joint population (default 15).
`max.niter`	integer, max number of iterations (default 100).
`nrep`	integer, number of repeated runs (to reduce effect of local optimum, default 1)
`init`	a list of parameters for parameter initialization. The list either contains all parameter sets: W,lambda.list, b.list, H.list, or only W will be used if provided (default NULL).
`subsample.prop`	a scalar between 0 and 1. smaller proportion with results in fast computation but less accurate results. By default the value is set to `min(5*10^4/ntotal, 1)`
`weight.list`	weights for performing weighted subsampling sketching. Note that the weight.list is a list of weights per batch. The weights for each batch is a vector of nonnegative values of the same size as the number of cells in the batch.
`tol`	numeric scalar, tolerance used in stopping criteria (default 1e-5).
`early.stopping`	Stop early if no improvement of objective function for this number of iterations.
`time.out`	Stop after the number of minutes running.
`future.plan`	plan for future parallel computation, can be chosen from 'sequential','transparent','multicore','multisession' and 'cluster'. Default is 'sequential'. Note that Rstudio does not support 'multicore'.
`workers`	additional parameter for `future::plan()`, in cases of 'multicore','multisession' and 'cluster'. `weight.list = lapply(1:length(X.list), function(j) statistical_leverage_score(X.list[[j]], k=r))`
`verbose`	boolean scalar, whether to show extensive program logs (default TRUE)
`seed`	random seed used (default 0)

a list containing

W: ngenes-by-r numeric matrix, estimated common factor matrix
H.list: A list of m factor loading matrix of size ncells-by-r, estimated factor loading matrices
b.list: A list of estimated shift vector of size p (ngenes).
lambda.list: A list of estimated scaling vector of size p (ngenes).
convergence: boolean, whether the algorithm converge
obj: numeric scalar, value of the objective function at convergence or when maximum iteration achieved
obj/history: a numeric vector, value of the objective function per iteration
deltaw: numeric, the relative change in W (common factor matrix) measured by Frobenious norm
deltaw.history: a vector of numeric values, the relative change in W (common factor matrix) per iteration.
niter: integer, the iteration at convergence (or maximum iteration if not converge)
params: list of parameters used for the algorithm: max.iter, tol, nrep, subsample.prop, weight.list