flash | R Documentation |
Fits an empirical Bayes matrix factorization (see Details for a
description of the model). The resulting fit is referred to as a "flash"
object (short for Factors and Loadings using Adaptive SHrinkage). Two
interfaces are provided. The flash
function provides a simple
interface that allows a flash object to be fit in a single pass, while
flash_xxx
functions are pipeable functions that allow for more
complex flash objects to be fit incrementally (available functions are
listed below under See Also). See the vignettes and
Examples for usage.
flash(
data,
S = NULL,
ebnm_fn = ebnm_point_normal,
var_type = 0L,
greedy_Kmax = 50L,
backfit = FALSE,
nullcheck = TRUE,
verbose = 1L
)
data |
The observations. Usually a matrix, but can also be a sparse
matrix of class |
S |
The standard errors. Can be |
ebnm_fn |
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions Any EBNM function provided by package |
var_type |
Describes the structure of the estimated residual variance.
Can be Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set |
greedy_Kmax |
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by |
backfit |
A "greedy" fit is performed by adding up to
|
nullcheck |
If |
verbose |
When and how to display progress updates. Set to
|
If Y
is an n \times p
data matrix, then the rank-one
empirical Bayes matrix factorization model is:
Y = \ell f' + E,
where \ell
is an
n
-vector of loadings, f
is a
p
-vector of factors, and E
is an
n \times p
matrix of residuals (or "errors").
Additionally:
e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p
\ell \sim g_\ell \in G_\ell
f \sim g_f \in G_f.
The residual variance parameters s_{ij}^2
are constrained to have
a simple structure and are fit via maximum likelihood. (For example, one
might assume that all standard errors are identical: s_{ij}^2 = s^2
for some s^2
and for all i
, j
).
The functions g_\ell
and g_f
are assumed to belong to
some families of priors G_\ell
and G_f
that are
specified in advance, and are estimated via variational approximation.
The general rank-K
empirical Bayes matrix factorization model is:
Y = LF' + E
or
y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,
where L
is now a matrix of loadings and F
is a matrix of
factors.
Separate priors g_\ell^{(k)}
and g_f^{(k)}
are estimated via
empirical Bayes, and different prior families may be used for different
values of k
. In general, then:
e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p
\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K
f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.
Typically, G_\ell^{(k)}
and G_f^{(k)}
will be closed under
scaling, in which case \ell_k
and f_k
are only identifiable
up to a scaling factor d_k
. In other words, we can write:
Y = LDF' + E,
where D
is a diagonal matrix with diagonal entries d_1, ..., d_K
.
The model can then be made identifiable by constraining the scale of
\ell_k
and f_k
for k = 1, ..., K
.
A flash
object. Contains elements:
n_factors
The total number of factor/loadings pairs K
in the fitted model.
pve
The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo
The variational lower bound achieved by the fitted model.
residuals_sd
Estimated residual standard deviations (these
include any variance component given as an argument to S
).
L_pm, L_psd, L_lfsr
Posterior means,
standard deviations, and local false sign rates for loadings L
.
F_pm, F_psd, F_lfsr
Posterior means,
standard deviations, and local false sign rates for factors F
.
L_ghat
The fitted priors on loadings
\hat{g}_\ell^{(k)}
.
F_ghat
The fitted priors on factors
\hat{g}_f^{(k)}
.
sampler
A function that takes a single argument
nsamp
and returns nsamp
samples from the posterior
distributions for factors F
and loadings L
.
flash_fit
A flash_fit
object. Used by
flash
when fitting is not performed all at once, but
incrementally via calls to various flash_xxx
functions.
The following methods are available:
fitted.flash
Returns the "fitted values"
E(LF') = E(L) E(F)'
.
residuals.flash
Returns the expected residuals
Y - E(LF') = Y - E(L) E(F)'
.
ldf.flash
Returns an LDF
decomposition (see
Details above), with columns of L
and F
scaled
as specified by the user.
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1–40.
flash_init
, flash_greedy
,
flash_backfit
, and flash_nullcheck
. For more
advanced functionality, see flash_factors_init
,
flash_factors_fix
, flash_factors_set_to_zero
,
flash_factors_remove
, flash_set_verbose
, and
flash_set_conv_crit
.
For extracting useful data from flash
objects, see
fitted.flash
, residuals.flash
, and
ldf.flash
.
data(gtex)
# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
# This is equivalent to the series of calls:
fl <- flash_init(gtex) %>%
flash_greedy(Kmax = 3L) %>%
flash_backfit() %>%
flash_nullcheck()
# Fit a unimodal distribution with mean zero to each set of loadings
# and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
ebnm_fn = c(ebnm_unimodal,
ebnm_normal_scale_mixture),
greedy_Kmax = 3)
# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
ebnm_fn = flash_ebnm(prior_family = "point_laplace",
optmethod = "trust"),
greedy_Kmax = 3)
# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.