Variational Bayesian Gaussian mixture model (VB-GMM)

Share:

Description

Given a N x D matrix of N observations and D variables, compute VB-GMM via VB-EM.

Usage

1
vbgmm(data, init = 2, prior, tol = 1e-20, maxiter = 2000, mirprior = TRUE, expectedTargetFreq = 0.01, verbose = FALSE)

Arguments

data

N x D numeric vector or matrix of N observations (rows) and D variables (columns)

init

Based on the dimension, init is expected to be one of the followings: scalar: number of components; vector: intial class labels; matrix: initialize with a D x K matrix for D variables and K components.

prior

A list containing the hyperparameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix).

tol

Threshold that defines termination/convergence of VB-EM when abs(L[t] - L[t-1])/abs(L[t]) < tol

maxiter

Scalar for maximum number of EM iterations

mirprior

Boolean to indicate whether to use expectedTargetFreq to initialize alpha0 for the hyperparameters of Dirichlet.

expectedTargetFreq

Expected target frequence within the gene population. By default, it is set to 0.01, which is consistent with the widely accepted prior knoweldge that 200/20000 targets per miRNA.

verbose

Boolean indicating whether to show progress in terms of lower bound (vbound) of VB-EM (default: FALSE)

Details

The function implements variation Bayesian multivariate GMM described in Bishop (2006). Please refer to the reference below for more details. This is the workhorse of targetScore. Alternatively, user can choose to apply this function to other problems other than miRNA target prediction.

Value

A list containing:

label

a vector of maximum-a-posteriori (MAP) assignments of latent discrete values based on the posteriors of latent variables.

R

N x D matrix of posteriors of latent variables

mu

Gaussian means of the latent components

full.model

A list containing posteriors R, logR, and the model parameters including alpha (Dirichlet), m (Gaussian mean), kappa (Gaussian variance), v (Wishart degree of freedom), M (Wishart precision matrix)

L

A vector of variational lower bound at each EM iterations (should be strictly increasing)

Author(s)

Yue Li

References

Mo Chen (2012). Matlab code for Variational Bayesian Inference for Gaussian Mixture Model. http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, Information Science and Statistics. NY, USA. (p474-486)

See Also

targetScore

Examples

1
2
3
X <- c(rnorm(100,mean=2), rnorm(100,mean=3))
tmp <- vbgmm(X, tol=1e-3)
names(tmp)