| flash | R Documentation |
Fits an empirical Bayes matrix factorization (see Details for a
description of the model). The resulting fit is referred to as a "flash"
object (short for Factors and Loadings using Adaptive SHrinkage). Two
interfaces are provided. The flash function provides a simple
interface that allows a flash object to be fit in a single pass, while
flash_xxx functions are pipeable functions that allow for more
complex flash objects to be fit incrementally (available functions are
listed below under See Also). See the vignettes and
Examples for usage.
flash(
data,
S = NULL,
ebnm_fn = ebnm_point_normal,
var_type = 0L,
greedy_Kmax = 50L,
backfit = FALSE,
nullcheck = TRUE,
verbose = 1L
)
data |
The observations. Usually a matrix, but can also be a sparse
matrix of class |
S |
The standard errors. Can be |
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
var_type |
Describes the structure of the estimated residual variance.
Can be Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set |
greedy_Kmax |
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by |
backfit |
A "greedy" fit is performed by adding up to
|
nullcheck |
If |
verbose |
When and how to display progress updates. Set to
|
If Y is an n \times p data matrix, then the rank-one
empirical Bayes matrix factorization model is:
Y = \ell f' + E,
where \ell is an
n-vector of loadings, f is a
p-vector of factors, and E is an
n \times p matrix of residuals (or "errors").
Additionally:
e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p
\ell \sim g_\ell \in G_\ell
f \sim g_f \in G_f.
The residual variance parameters s_{ij}^2 are constrained to have
a simple structure and are fit via maximum likelihood. (For example, one
might assume that all standard errors are identical: s_{ij}^2 = s^2
for some s^2 and for all i, j).
The functions g_\ell and g_f are assumed to belong to
some families of priors G_\ell and G_f that are
specified in advance, and are estimated via variational approximation.
The general rank-K empirical Bayes matrix factorization model is:
Y = LF' + E
or
y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,
where L is now a matrix of loadings and F is a matrix of
factors.
Separate priors g_\ell^{(k)} and g_f^{(k)} are estimated via
empirical Bayes, and different prior families may be used for different
values of k. In general, then:
e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p
\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K
f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.
Typically, G_\ell^{(k)} and G_f^{(k)} will be closed under
scaling, in which case \ell_k and f_k are only identifiable
up to a scaling factor d_k. In other words, we can write:
Y = LDF' + E,
where D is a diagonal matrix with diagonal entries d_1, ..., d_K.
The model can then be made identifiable by constraining the scale of
\ell_k and f_k for k = 1, ..., K.
A flash object. Contains elements:
n_factorsThe total number of factor/loadings pairs K
in the fitted model.
pveThe proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elboThe variational lower bound achieved by the fitted model.
residuals_sdEstimated residual standard deviations (these
include any variance component given as an argument to S).
L_pm, L_psd, L_lfsrPosterior means,
standard deviations, and local false sign rates for loadings L.
F_pm, F_psd, F_lfsrPosterior means,
standard deviations, and local false sign rates for factors F.
L_ghatThe fitted priors on loadings
\hat{g}_\ell^{(k)}.
F_ghatThe fitted priors on factors
\hat{g}_f^{(k)}.
samplerA function that takes a single argument
nsamp and returns nsamp samples from the posterior
distributions for factors F and loadings L.
flash_fitA flash_fit object. Used by
flash when fitting is not performed all at once, but
incrementally via calls to various flash_xxx functions.
The following methods are available:
fitted.flashReturns the "fitted values"
E(LF') = E(L) E(F)'.
residuals.flashReturns the expected residuals
Y - E(LF') = Y - E(L) E(F)'.
ldf.flashReturns an LDF decomposition (see
Details above), with columns of L and F scaled
as specified by the user.
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.
flash_init, flash_greedy,
flash_backfit, and flash_nullcheck. For more
advanced functionality, see flash_factors_init,
flash_factors_fix, flash_factors_set_to_zero,
flash_factors_remove, flash_set_verbose, and
flash_set_conv_crit.
For extracting useful data from flash objects, see
fitted.flash, residuals.flash, and
ldf.flash.
data(gtex)
# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
# This is equivalent to the series of calls:
fl <- flash_init(gtex) %>%
flash_greedy(Kmax = 3L) %>%
flash_backfit() %>%
flash_nullcheck()
# Fit a unimodal distribution with mean zero to each set of loadings
# and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
ebnm_fn = c(ebnm_unimodal,
ebnm_normal_scale_mixture),
greedy_Kmax = 3)
# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
ebnm_fn = flash_ebnm(prior_family = "point_laplace",
optmethod = "trust"),
greedy_Kmax = 3)
# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.