CFITIntegrate_sketched: Integration of multiple data source by fast cFIT with...

Description Usage Arguments Value

View source: R/data_integration_sketched.R

Description

Solve the model parameters through Iterative Nonnegative Matrix Factorization (iNMF), by minimizing the sketched objective function

1/\tilde{N} ∑_j||SX_j -(SH_JW^TΛ_j + S1_nj b_j^T)||_F^2 + gamma ∑_{l=1}^p(∑_{j=1}^m\tilde{n}_j/\tilde{N} λ_{jl}-1)^2

, with additional penalty for SPP.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CFITIntegrate_sketched(
  X.list,
  r = 15,
  max.niter = 100,
  nrep = 1,
  init = NULL,
  subsample.prop = NULL,
  weight.list = NULL,
  tol = 1e-06,
  early.stopping = 50,
  time.out = 60 * 2,
  future.plan = c("sequential", "transparent", "multicore", "multisession", "cluster"),
  workers = parallel::detectCores() - 1,
  verbose = T,
  seed = 0
)

Arguments

X.list

a list of m ncells-by-ngenes, gene expression matrices from m data sets

r

scalar, dimension of common factor matrix, which can be chosen as the rough number of identifiable cells types in the joint population (default 15).

max.niter

integer, max number of iterations (default 100).

nrep

integer, number of repeated runs (to reduce effect of local optimum, default 1)

init

a list of parameters for parameter initialization. The list either contains all parameter sets: W,lambda.list, b.list, H.list, or only W will be used if provided (default NULL).

subsample.prop

a scalar between 0 and 1. smaller proportion with results in fast computation but less accurate results. By default the value is set to min(5*10^4/ntotal, 1)

weight.list

weights for performing weighted subsampling sketching. Note that the weight.list is a list of weights per batch. The weights for each batch is a vector of nonnegative values of the same size as the number of cells in the batch.

tol

numeric scalar, tolerance used in stopping criteria (default 1e-5).

early.stopping

Stop early if no improvement of objective function for this number of iterations.

time.out

Stop after the number of minutes running.

future.plan

plan for future parallel computation, can be chosen from 'sequential','transparent','multicore','multisession' and 'cluster'. Default is 'sequential'. Note that Rstudio does not support 'multicore'.

workers

additional parameter for future::plan(), in cases of 'multicore','multisession' and 'cluster'. weight.list = lapply(1:length(X.list), function(j) statistical_leverage_score(X.list[[j]], k=r))

verbose

boolean scalar, whether to show extensive program logs (default TRUE)

seed

random seed used (default 0)

Value

a list containing

W

ngenes-by-r numeric matrix, estimated common factor matrix

H.list

A list of m factor loading matrix of size ncells-by-r, estimated factor loading matrices

b.list

A list of estimated shift vector of size p (ngenes).

lambda.list

A list of estimated scaling vector of size p (ngenes).

convergence

boolean, whether the algorithm converge

obj

numeric scalar, value of the objective function at convergence or when maximum iteration achieved

obj/history

a numeric vector, value of the objective function per iteration

deltaw

numeric, the relative change in W (common factor matrix) measured by Frobenious norm

deltaw.history

a vector of numeric values, the relative change in W (common factor matrix) per iteration.

niter

integer, the iteration at convergence (or maximum iteration if not converge)

params

list of parameters used for the algorithm: max.iter, tol, nrep, subsample.prop, weight.list


pengminshi/cFIT documentation built on July 11, 2021, 11:12 p.m.