ebpmf_log: Fit empirical Bayes Poisson matrix factorization with log...
In DongyueXie/stm: Empirical Bayes Poisson Matrix Factorization

ebpmf_log

R Documentation

Fit empirical Bayes Poisson matrix factorization with log link function

Description

Fit empirical Bayes Poisson matrix factorization with log link function

Usage

ebpmf_log(
  Y,
  l0 = NULL,
  f0 = NULL,
  var_type = "by_col",
  general_control = list(),
  vga_control = list(),
  flash_control = list(),
  sigma2_control = list(),
  init_control = list(),
  verbose = TRUE
)

Arguments

`Y`	count data matrix, can be sparse format
`l0`, `f0`	The background loadings and factors, see the model in ‘Details’.
`var_type`	variance type, "by_row", "by_col" or "constant", see the model in ‘Details’
`general_control`	A list of parameters controlling the behavior of the algorithm. See ‘Details’.
`vga_control`	A list of parameters controlling the behavior of the VGA step. See ‘Details’.
`flash_control`	A list of parameters controlling the behavior of the flash step. See ‘Details’.
`sigma2_control`	A list of parameters controlling the behavior of updating variance. See ‘Details’.
`verbose`	TRUE to print the model fitting progress

Details

The model is

y_{ij}\sim \text{Poisson}(\exp(\mu_{ij})),

\mu_{ij} = l_{i0} + f_{j0} + \sum_k l_{ik}f_{jk} + \epsilon_{ij},

l_{i0}\sim g_{l_0}(\cdot), f_{j0}\sim g_{f_0}(\cdot),

l_{ik}\sim g_{l_k}(\cdot),f_{jk}\sim g_{f_k}(\cdot),

\epsilon_{ij}\sim N(0,\sigma^2_{ij}).

The init_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_init_control_default):

sigma2_init: The init value of sigma2
M_init: the initial value for latent M
init_tol: tolerance for initialization
init_maxiter: max iteration for initialization
verbose: TRUE to print initialization progress
printevery: Set a number to determine how often to print progress
ebpm_init: whether use ash_pois for single gene model, as init for vga
conv_type: for init vga fit, use either 'elbo' or 'sigma2abs' for convergence criteria
n_cores: Can utilize more than 1 core to perform initialization, using 'mclapply' function.
flash_est_sigma2: TRUE to use flash for initializing sigma2
log_init_for_non0yIf TRUE, then for non-0 counts, use log(Y/exp(offset)) as init values
n_refit_flash_init: The times to refit flash using another seed if no structure was found in initialization
deal_with_no_init_factor: If no factor found in initialization, use 'reduce_var' to reduce init var for flash, or 'flash_dryrun' for not providing the variance

The general_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_general_control_default):

batch_size: Set this number to 1000 or 10000 or similar to reduce memory usage for vga step by looping subsets of dataset.
maxiter: max iteration allowed.
conv_tol: tolerance for convergence
printevery: How often to print progress over iterations
verbose: TRUE to print progress
garbage_collection_every: How often to perform 'gc()' to reduce memory usage
save_init_val: TRUE to return initailization values of latent mu and sigma2
save_latent_M: TRUE to return latent M, its size can be very large
save_fit_every: How often to save intermediate results?
save_fit_path: Where to save intermediate results path
save_fit_name: What is the name to save intermediate results

The flash_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_flash_control_default):

ebnm.fn: see '?flash', 'ebnm_fn'.
ebnm.fn.offset: The prior for l_0, f_0, if not fixing them.
loadings_sign: see '?flash_greedy_init_default' sign_constraints, must match ebnm.fn
factors_sign: see '?flash_greedy_init_default' sign_constraints, must match ebnm.fn
fix_l0: fix l_0?
fix_f0: fix f_0?
Kmax: see '?flash', 'greedy_Kmax'.
add_greedy_Kmax: The Kmax in add_greedy in iterations
add_greedy_warmstart: see '?flash_greedy'
add_greedy_extrapolate: see '?flash_greedy'
add_greedy_every: perform flash_greedy every 'add_greedy_every' iterations.
maxiter_backfitting: max iterations for the flash backfitting,see '?flash_backfit'
backfit_extrapolate: see '?flash_backfit'
backfit_warmstart: see '?flash_backfit'
verbose_flash: whether print flash updates

The vga_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_vga_control_default):

maxiter_vga: max iterations for vga step Newton's method
vga_tol: tolerance for stopping the optimization.

The sigma2_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_sigma2_control_default):

est_sigma2: whether estimate the variance term or fix it at sigma2_init
a0,b0: Inverse-Gamma(a0,b0) prior on sigma2 for regularization.
cap_var_mean_ratio: only update sigma2 when if var/mean > (1+cap_var_mean_ratio). i.e. when overdispersion is low enough, stop updating sigma2 to boost convergence.
return_sigma2_trace: TRUE to return the sigma2 values along the iterations. internal usage only

Value

A list of:

`fit_flash:`	fitted flash object
`elbo:`	evidence lower bound value
`K_trace:`	trace of number of factors
`elbo_trace:`	trace of elbo
`sigma2:`	the variance estimates
`run_time:`	run time of the algorithm