flash_greedy: Greedily add factors to a flash object

View source: R/flash_greedy.R

flash_greedyR Documentation

Greedily add factors to a flash object

Description

Adds factor/loadings pairs to a flash object in a "greedy" manner. Up to Kmax pairs are added one at a time. At each step, flash_greedy attempts to find an optimal additional (rank-one) factor given all previously added factors. The additional factor is retained if it increases the variational lower bound (ELBO); otherwise, fitting terminates. See flash for examples of usage.

Usage

flash_greedy(
  flash,
  Kmax = 1,
  ebnm_fn = ebnm_point_normal,
  init_fn = NULL,
  extrapolate = FALSE,
  warmstart = FALSE,
  maxiter = 500,
  tol = NULL,
  verbose = NULL
)

Arguments

flash

A flash or flash_fit object to which factors are to be added.

Kmax

The maximum number of factors to be added. This will not necessarily be the total number of factors added by flash_greedy, since factors are only added as long as they increase the ELBO.

ebnm_fn

The function or functions used to solve the empirical Bayes normal means (EBNM) subproblems. Most importantly, these functions specify the families of distributions G_\ell^{(k)} and G_f^{(k)} to which the priors on loadings and factors g_\ell^{(k)} and g_f^{(k)} are assumed to belong. If the same function is to be used for both loadings L and factors F, then ebnm_fn can be a single function. If one function is to be used for loadings and a second for factors, then ebnm_fn should be a list of length two, with the first element giving the function for loadings and the second the function for factors. If different functions are to be used for different values of k, then factor/loadings pairs must be added successively using multiple calls to either flash_greedy or flash_factors_init.

Any EBNM function provided by package ebnm can be used as input. Non-default arguments to parameters can be supplied using the helper function flash_ebnm. Custom EBNM functions can also be used: for details, see flash_ebnm.

init_fn

The function used to initialize factor/loadings pairs. Functions flash_greedy_init_default, flash_greedy_init_softImpute, and flash_greedy_init_irlba have been supplied; note, in particular, that flash_greedy_init_softImpute can yield better results than the default initialization function when there is missing data. Custom initialization functions may also be used. If init_fn = NULL then flash_greedy_init_default will be used, with an attempt made to set argument sign_constraints appropriately via test calls to the EBNM function(s) specified by parameter ebnm_fn. If factors or loadings are constrained in some other fashion (e.g., bounded support), then the initialization function should be modified to account for the constraints — otherwise, the greedy algorithm can stop adding factor/loadings pairs too early. Custom initialization functions should accept a single parameter referring to a flash_fit object and should output a list consisting of two vectors, which will be used as initial values for the new loadings \ell_{\cdot k} and the new factor f_{\cdot k}. Typically, a custom initialization function will extract the matrix of residuals from the flash_fit object using method residuals.flash_fit and then return a (possibly constrained) rank-one approximation to the matrix of residuals. See Examples below.

extrapolate

Whether to use an extrapolation technique inspired by Ang and Gillis (2019) to accelerate the fitting process. Control parameters are handled via global options and can be set by calling options("extrapolate.control") <- control.param.

warmstart

Whether to use "warmstarts" when solving the EBNM subproblems by initializing solutions at the previous value of the fitted prior \hat{g}. An important side effect of warmstarts for ashr-like prior families is to fix the grid at its initial setting. Fixing the grid can lead to poor fits if there are large changes in the scale of the estimated prior over the course of the fitting process. However, allowing the grid to vary can occasionally result in decreases in ELBO.

maxiter

The maximum number of iterations when optimizing a greedily added factor/loadings pair.

tol

The convergence tolerance parameter. At each iteration, the fit is compared to the fit from the previous iteration using a convergence criterion function (by default, the difference in ELBO, but the criterion can be changed via flash_set_conv_crit). When the value returned by this function is less than or equal to tol, the newly added factor/loadings pair is considered to have "converged," so that flash_greedy moves on and attempts to add another new factor (or, if the maximum number of factors Kmax has been reached, the process terminates). Note that specifying tol here will override any value set by flash_set_conv_crit; to use the "global" tolerance parameter, tol must be left unspecified (NULL). If tol = NULL and a global tolerance parameter has not been set, then the default tolerance used is np\sqrt{\epsilon}, where n is the number of rows in the dataset, p is the number of columns, and \epsilon is equal to .Machine$double.eps.

verbose

When and how to display progress updates. Set to 0 for none, 1 for updates after a factor is added or a backfit is completed, 2 for additional notifications about the variational lower bound, and 3 for updates after every iteration. It is also possible to output a single tab-delimited table of values using function flash_set_verbose with verbose = -1.

Value

The flash object from argument flash, with up to Kmax new factor/loadings pairs "greedily" added.

See Also

flash_greedy_init_default, flash_greedy_init_softImpute, flash_greedy_init_irlba

Examples

# The following are examples of advanced usage. See ?flash for basic usage.

# Increase the maximum number of iterations in the default initialization
#   method.
my_init_fn <- function(f) flash_greedy_init_default(f, maxiter = 500)
fl <- flash_init(gtex) %>%
  flash_greedy(init_fn = my_init_fn)

# Use a custom initialization function that wraps function nmf from
#   package RcppML.
nmf_init_fn <- function(f) {
  nmf_res <- RcppML::nmf(resid(f), k = 1, verbose = FALSE)
  return(list(as.vector(nmf_res$w), as.vector(nmf_res$h)))
}
fl.nmf <- flash_init(gtex) %>%
  flash_greedy(ebnm_fn = ebnm_unimodal_nonnegative,
               init_fn = nmf_init_fn)


flashier documentation built on Oct. 17, 2023, 5:07 p.m.