flash_add_greedy: Fit Empirical Bayes Matrix Factorization (greedy algorithm)

View source: R/greedy.R

flash_add_greedyR Documentation

Fit Empirical Bayes Matrix Factorization (greedy algorithm)

Description

This implements the greedy algorithm from Wang and Stephens. It can be used to adds factors to an existing fit, or start from scratch. It adds factors iteratively, at each stage adding a new factor and then optimizing it. It is "greedy" in that it does not return to re-optimize previous factors. The function stops when an added factor contributes nothing, or Kmax is reached. Each new factor is intialized by applying the function init_fn to the residuals after removing previously-fitted factors.

Usage

flash_add_greedy(data, Kmax = 100, f_init = NULL,
  var_type = c("by_column", "by_row", "constant", "zero", "kroneker"),
  init_fn = "udv_si", tol = 0.01, ebnm_fn = "ebnm_pn",
  ebnm_param = NULL, verbose = TRUE, nullcheck = TRUE, seed = 123)

Arguments

data

An n by p matrix or a flash data object created using flash_set_data.

Kmax

The maximum number of factors to be added to the flash object.

f_init

The flash object or flash fit object to which new factors are to be added. If f_init = NULL, then a new flash object is created.

var_type

The type of variance structure to assume for residuals. Options include:

"by_column"

Residuals in any given column are assumed to have the same variance.

"by_row"

Residuals in any given row have the same variance.

"constant"

All residuals are assumed to have the same variance.

"zero"

The variance of the residuals is fixed. To use this variance type, the standard errors must be specified via parameter S when using flash_set_data to set the flash data object.

"kroneker"

This variance type has not yet been implemented.

init_fn

The function used to initialize factors. Options include:

"udv_si"

Provides a simple wrapper to softImpute to provide a rank-one initialization. Uses option type = "als".

"udv_si_svd"

Uses softImpute with option type = "svd".

"udv_svd"

Provides a simple wrapper to svd.

"udv_random"

Provides a random initialization of factors.

A user-specified function can also be used. This function should take parameters (Y, K), where Y is an n by p matrix of data (or a flash data object) and K is the number of factors. It should output a list with elements (u, d, v), where u is a n by K matrix, v is a p by K matrix, and d is a K vector. (If the input data includes missing values, then the function must be able to deal with missing values in its input matrix.)

tol

Specifies how much the objective can change in a single iteration to be considered not converged.

ebnm_fn

The function used to solve the Empirical Bayes Normal Means problem. Either a single character string (giving the name of of the function) or a list with fields l and f (specifying different functions to be used for loadings and factors) are acceptable arguments. Options include:

"ebnm_ash"

A wrapper to the function ash.

"ebnm_pn"

A wrapper to function ebnm_point_normal in package ebnm.

"ebnm_pl"

A wrapper to function ebnm_point_laplace in ebnm.

ebnm_param

A named list containing parameters to be passed to ebnm_fn when optimizing. A list with fields l and f (each of which is a named list) will separately supply parameters for loadings and factors. If parameter warmstart is used, the current value of g (if available) will be passed to ebnm_fn. (So, ebnm_fn should accept a parameter named g, not one named warmstart.) Set ebnm_param to NULL to use defaults.

verbose

If TRUE, various progress updates will be printed.

nullcheck

If TRUE, then after running hill-climbing updates flash will check whether the achieved optimum is better than setting the factor to zero. If the check is performed and fails then the factor will be set to zero in the returned fit.

seed

A random number seed to use before running flash - for reproducibility. Set to NULL if you don't want the seed set. (The seed can affect initialization when there are missing data; otherwise the algorithm is deterministic.)

Value

A flash object.

Examples

l = rnorm(100)
f = rnorm(10)
Y = outer(l, f) + matrix(rnorm(1000), nrow=100)
f = flash_add_greedy(Y,10)

# Gives the weights for each factor (analogue of singular values).
f$ldf$d

# Example to show how to use a different initialization function.
library(softImpute)
f2 = flash_add_greedy(Y, 10, init_fn=function(x, K=1) {
  softImpute(x, K, lambda=10)
})


stephenslab/flashr2 documentation built on Feb. 6, 2024, 5:21 a.m.