variational_fit: Variational fit for Binomial mixtures

View source: R/variational_fit.R

variational_fitR Documentation

Variational fit for Binomial mixtures

Description

Variational fit for a semi-parametric Dirichelt mixture of Binomial distributions. The fit convergency can be monitored through the ELBO, can be run either sequentially (single core) or in parallel. You need to provide an upper bound on the number of clusters that you want to obtain, through parameters K. You can explicit the Dirichlet prior for the concentration of the mixture (alpha_0), as well as the hyperparmeters of the Beta priors for each mixture component.

Usage

variational_fit(
  x,
  y,
  data = NULL,
  K = 10,
  alpha_0 = 1e-06,
  a_0 = 1,
  b_0 = 1,
  max_iter = 5000,
  epsilon_conv = 1e-10,
  samples = 10,
  q_init = "prior",
  trace = FALSE,
  description = "My VIBER model"
)

Arguments

x

A matrix where each column is a dimension of the multivariate Binomial, and each row is an input point. Values of this matrix represent the number of successes of independent Bernoulli trials. This matrix and y should have the same dimension (N x K, N points, K dimensions).

y

A matrix where each column is a dimension of the multivariate Binomial, and each row is an input point. Values of this matrix represent the number of attempts of independent Bernoulli trials. This matrix and x should have the same dimension (N x K, N points, K dimensions).

data

Extra data.frame (N x K, N points, W attributes) to store inside the output object W annotations for each one of the N input points. This parameter can also be NULL, in this case there is no extra annotation associated to the input. The annotations are necessary if one seeks to use VIBER to analyse cancer multi-sample sequencing data (the Binomial counts are in that case "cancer sequencing read counts"); in that case in the annotations there must be two columns, gene and driver reporting a gene identifier for the input mutation, and its boolean driver status. The extra annotation data will be stored in the data field of the output.

K

The maximum number of clusters returned, it should be lower than the number of rows of x and y. Default is K = 10; lower values speed up convergence.

alpha_0

The concentration parameter of the Dirichlet mixture. The default is a stringent fit with alpha = 1e-6.

a_0

Prior Beta hyperparameter. If this values is a scalar than all the mixture components have the same prior. The default is scalar a_0 = 1.

b_0

Prior Beta hyperparameter. If this values is a scalar than all the mixture components have the same prior. The default is scalar b_0 = 1.

max_iter

Maximum number of fit iterations. The fit is interrupted when this number of iterations is performed. Default max_iter = 5000

epsilon_conv

Epsilon to measure convergence (ELBO absolute difference).

samples

Number of fits computed by the algorithm. Only the best fit is returned. This value must be greater or equal than 1.

q_init

Initialization of the q-distribution to compute the approximation of the posterior distributions. This can be set in three different waysL equal to the prior (q_init = 'prior'), via kmeans clustering (q_init = 'kmeans') and capturing points which are private to each dimension (q_init = 'private'). The default is equal to the prior.

trace

If true the trace computed during the fit is returned (this allows to check fits a posterirori, make animations etc.). Default is FALSE; this feature can slow down quite substantially the fit.

Value

An object of class vb_bmm which contains S3 methods to extract the fit, plots the results, compute summary statistics etc.

Examples

data(mvbmm_example)
f = variational_fit(mvbmm_example$successes, mvbmm_example$trials)
print(f)

caravagn/VIBER documentation built on July 16, 2022, 1:23 a.m.