flash: Fit Empirical Bayes Matrix Factorization
In stephenslab/flashr: Empirical Bayes Matrix Factorization

flash

R Documentation

Fit Empirical Bayes Matrix Factorization

Description

This is the main interface for fitting EBMF models based on algorithms from Wang and Stephens. The default behaviour is simply to run the greedy algorithm and return the result. To follow it by backfitting set backfit = TRUE.

Usage

flash(data, Kmax = 100, f_init = NULL, var_type = c("by_column",
  "by_row", "constant", "zero", "kroneker"), init_fn = "udv_si",
  tol = 0.01, ebnm_fn = "ebnm_pn", ebnm_param = NULL,
  verbose = TRUE, nullcheck = TRUE, seed = 123, greedy = TRUE,
  backfit = FALSE)

Arguments

`data`	An n by p matrix or a flash data object created using `flash_set_data`.
`Kmax`	The maximum number of factors to be added to the flash object.
`f_init`	The flash object or flash fit object to which new factors are to be added. If `f_init = NULL`, then a new flash object is created.
`var_type`	The type of variance structure to assume for residuals. Options include: `"by_column"` Residuals in any given column are assumed to have the same variance. `"by_row"` Residuals in any given row have the same variance. `"constant"` All residuals are assumed to have the same variance. `"zero"` The variance of the residuals is fixed. To use this variance type, the standard errors must be specified via parameter `S` when using `flash_set_data` to set the flash data object. `"kroneker"` This variance type has not yet been implemented.
`init_fn`	The function used to initialize factors. Options include: `"udv_si"` Provides a simple wrapper to `softImpute` to provide a rank-one initialization. Uses option `type = "als"`. `"udv_si_svd"` Uses `softImpute` with option `type = "svd"`. `"udv_svd"` Provides a simple wrapper to `svd`. `"udv_random"` Provides a random initialization of factors. A user-specified function can also be used. This function should take parameters `(Y, K)`, where `Y` is an n by p matrix of data (or a flash data object) and `K` is the number of factors. It should output a list with elements `(u, d, v)`, where `u` is a n by K matrix, `v` is a p by K matrix, and `d` is a K vector. (If the input data includes missing values, then the function must be able to deal with missing values in its input matrix.)
`tol`	Specifies how much the objective can change in a single iteration to be considered not converged.
`ebnm_fn`	The function used to solve the Empirical Bayes Normal Means problem. Either a single character string (giving the name of of the function) or a list with fields `l` and `f` (specifying different functions to be used for loadings and factors) are acceptable arguments. Options include: `"ebnm_ash"` A wrapper to the function `ash`. `"ebnm_pn"` A wrapper to function `ebnm_point_normal` in package ebnm. `"ebnm_pl"` A wrapper to function `ebnm_point_laplace` in ebnm.
`ebnm_param`	A named list containing parameters to be passed to `ebnm_fn` when optimizing. A list with fields `l` and `f` (each of which is a named list) will separately supply parameters for loadings and factors. If parameter `warmstart` is used, the current value of `g` (if available) will be passed to `ebnm_fn`. (So, `ebnm_fn` should accept a parameter named `g`, not one named `warmstart`.) Set `ebnm_param` to `NULL` to use defaults.
`verbose`	If `TRUE`, various progress updates will be printed.
`nullcheck`	If `TRUE`, then after running hill-climbing updates `flash` will check whether the achieved optimum is better than setting the factor to zero. If the check is performed and fails then the factor will be set to zero in the returned fit.
`seed`	A random number seed to use before running `flash` - for reproducibility. Set to `NULL` if you don't want the seed set. (The seed can affect initialization when there are missing data; otherwise the algorithm is deterministic.)
`greedy`	If `TRUE`, factors are added via the greedy algorithm. If `FALSE`, then `f_init` must be supplied.
`backfit`	If `TRUE`, factors are refined via the backfitting algorithm.

Value

A flash object. Use flash_get_ldf to access standardized loadings and factors; use flash_get_fitted_values to access fitted LF'.

Examples


set.seed(1) # for reproducibility
ftrue = matrix(rnorm(200), ncol=2)
ltrue = matrix(rnorm(40), ncol=2)
ltrue[1:10, 1] = 0 # set up some sparsity
ltrue[11:20, 2] = 0
Y = ltrue %*% t(ftrue) + rnorm(2000) # set up a simulated matrix
f = flash(Y)
ldf = f$ldf

# Show the weights, analogous to singular values showing importance
# of each factor.
ldf$d

# Plot true l against estimated l; with this seed it turns out the
# 2nd loading/factor corresponds to the first column of ltrue.
plot(ltrue[,1], ldf$l[,2])

# Plot true f against estimated f (note estimate is normalized).
plot(ftrue[,1], ldf$f[,2])

# Plot true lf' against estimated lf'; the scale of the estimate
# matches the data.
plot(ltrue %*% t(ftrue), f$fitted_values)

# Example to use the more flexible ebnm function in ashr.
f2 = flash(Y, ebnm_fn="ebnm_ash")

# Example to show how to pass parameters to ashr (may be most
# useful for research use).
f3 = flash(Y,
           ebnm_fn="ebnm_ash",
           ebnm_param=list(mixcompdist="normal", method="fdr"))

# Example to show how to separately specify parameters for factors
# and loadings.
f4 = flash(Y,
           ebnm_fn=list(l="ebnm_pn", f="ebnm_ash"),
           ebnm_param=list(l=list(),
                           f=list(g=ashr::normalmix(1,0,1), fixg=TRUE)))

# Example to show how to use a different initialization function.
library(softImpute)
f5 = flash(Y, init_fn=function(x, K=1){softImpute(x, K, lambda=10)})

stephenslab/flashr documentation built on Dec. 4, 2024, 9:24 p.m.