ebpmf_log: Fit empirical Bayes Poisson matrix factorization with log...

View source: R/ebpmf_log.R

ebpmf_logR Documentation

Fit empirical Bayes Poisson matrix factorization with log link function

Description

Fit empirical Bayes Poisson matrix factorization with log link function

Usage

ebpmf_log(
  Y,
  l0 = NULL,
  f0 = NULL,
  var_type = "by_col",
  general_control = list(),
  vga_control = list(),
  flash_control = list(),
  sigma2_control = list(),
  init_control = list(),
  verbose = TRUE
)

Arguments

Y

count data matrix, can be sparse format

l0, f0

The background loadings and factors, see the model in ‘Details’.

var_type

variance type, "by_row", "by_col" or "constant", see the model in ‘Details’

general_control

A list of parameters controlling the behavior of the algorithm. See ‘Details’.

vga_control

A list of parameters controlling the behavior of the VGA step. See ‘Details’.

flash_control

A list of parameters controlling the behavior of the flash step. See ‘Details’.

sigma2_control

A list of parameters controlling the behavior of updating variance. See ‘Details’.

verbose

TRUE to print the model fitting progress

Details

The model is

y_{ij}\sim \text{Poisson}(\exp(\mu_{ij})),

\mu_{ij} = l_{i0} + f_{j0} + \sum_k l_{ik}f_{jk} + \epsilon_{ij},

l_{i0}\sim g_{l_0}(\cdot), f_{j0}\sim g_{f_0}(\cdot),

l_{ik}\sim g_{l_k}(\cdot),f_{jk}\sim g_{f_k}(\cdot),

\epsilon_{ij}\sim N(0,\sigma^2_{ij}).

The init_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_init_control_default):

sigma2_init

The init value of sigma2

M_init

the initial value for latent M

init_tol

tolerance for initialization

init_maxiter

max iteration for initialization

verbose

TRUE to print initialization progress

printevery

Set a number to determine how often to print progress

ebpm_init

whether use ash_pois for single gene model, as init for vga

conv_type

for init vga fit, use either 'elbo' or 'sigma2abs' for convergence criteria

n_cores

Can utilize more than 1 core to perform initialization, using 'mclapply' function.

flash_est_sigma2

TRUE to use flash for initializing sigma2

log_init_for_non0yIf TRUE, then for non-0 counts, use log(Y/exp(offset)) as init values
n_refit_flash_init

The times to refit flash using another seed if no structure was found in initialization

deal_with_no_init_factor

If no factor found in initialization, use 'reduce_var' to reduce init var for flash, or 'flash_dryrun' for not providing the variance

The general_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_general_control_default):

batch_size

Set this number to 1000 or 10000 or similar to reduce memory usage for vga step by looping subsets of dataset.

maxiter

max iteration allowed.

conv_tol

tolerance for convergence

printevery

How often to print progress over iterations

verbose

TRUE to print progress

garbage_collection_every

How often to perform 'gc()' to reduce memory usage

save_init_val

TRUE to return initailization values of latent mu and sigma2

save_latent_M

TRUE to return latent M, its size can be very large

save_fit_every

How often to save intermediate results?

save_fit_path

Where to save intermediate results path

save_fit_name

What is the name to save intermediate results

The flash_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_flash_control_default):

ebnm.fn

see '?flash', 'ebnm_fn'.

ebnm.fn.offset

The prior for l_0, f_0, if not fixing them.

loadings_sign

see '?flash_greedy_init_default' sign_constraints, must match ebnm.fn

factors_sign

see '?flash_greedy_init_default' sign_constraints, must match ebnm.fn

fix_l0

fix l_0?

fix_f0

fix f_0?

Kmax

see '?flash', 'greedy_Kmax'.

add_greedy_Kmax

The Kmax in add_greedy in iterations

add_greedy_warmstart

see '?flash_greedy'

add_greedy_extrapolate

see '?flash_greedy'

add_greedy_every

perform flash_greedy every 'add_greedy_every' iterations.

maxiter_backfitting

max iterations for the flash backfitting,see '?flash_backfit'

backfit_extrapolate

see '?flash_backfit'

backfit_warmstart

see '?flash_backfit'

verbose_flash

whether print flash updates

The vga_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_vga_control_default):

maxiter_vga

max iterations for vga step Newton's method

vga_tol

tolerance for stopping the optimization.

The sigma2_control argument is a list in which any of the following named components will override the default algorithm settings (as defined by ebpmf_log_sigma2_control_default):

est_sigma2

whether estimate the variance term or fix it at sigma2_init

a0,b0

Inverse-Gamma(a0,b0) prior on sigma2 for regularization.

cap_var_mean_ratio

only update sigma2 when if var/mean > (1+cap_var_mean_ratio). i.e. when overdispersion is low enough, stop updating sigma2 to boost convergence.

return_sigma2_trace

TRUE to return the sigma2 values along the iterations. internal usage only

Value

A list of:

fit_flash:

fitted flash object

elbo:

evidence lower bound value

K_trace:

trace of number of factors

elbo_trace:

trace of elbo

sigma2:

the variance estimates

run_time:

run time of the algorithm


DongyueXie/stm documentation built on June 18, 2024, 11:01 a.m.