vdp.mixt: vdp.mixt In netresponse: Functional Network Analysis

Description

Accelerated variational Dirichlet process Gaussian mixture.

Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 vdp.mixt( dat, prior.alpha = 1, prior.alphaKsi = 0.01, prior.betaKsi = 0.01, do.sort = TRUE, threshold = 1e-05, initial.K = 1, ite = Inf, implicit.noise = 0, c.max = 10, speedup = TRUE, min.size = 5 ) 

Arguments

 dat Data matrix (samples x features). prior.alpha, prior.alphaKsi, prior.betaKsi Prior parameters for Gaussian mixture model (normal-inverse-Gamma prior). alpha tunes the mean; alphaKsi and betaKsi are the shape and scale parameters of the inverse Gamma function, respectively. do.sort When true, qOFz will be sorted in decreasing fashion by component size, based on colSums(qOFz). The qOFz matrix describes the sample-component assigments in the mixture model. threshold Defines the minimal free energy improvement that stops the algorithm: used to define convergence limit. initial.K Initial number of mixture components. ite Defines maximum number of iterations on posterior update (updatePosterior). Increasing this can potentially lead to more accurate results, but computation may take longer. implicit.noise Adds implicit noise; used by vdp.mk.log.lambda.so and vdp.mk.hp.posterior.so. By adding noise (positive values), one can avoid overfitting to local optima in some cases, if this happens to be a problem. c.max Maximum number of candidates to consider in find.best.splitting. During mixture model calculations new mixture components can be created until this upper limit has been reached. Defines the level of truncation for a truncated stick-breaking process. speedup When learning the number of components, each component is splitted based on its first PCA component. To speed up, approximate by using only subset of data to calculate PCA. min.size Minimum size for a component required for potential splitting during mixture estimation.

Details

Implementation of the Accelerated variational Dirichlet process Gaussian mixture model algorithm by Kenichi Kurihara et al., 2007.

ALGORITHM SUMMARY This code implements Gaussian mixture models with diagonal covariance matrices. The following greedy iterative approach is taken in order to obtain the number of mixture models and their corresponding parameters:

1. Start from one cluster, $T = 1$. 2. Select a number of candidate clusters according to their values of 'Nc' = \sum_n=1^N q_z_n (z_n = c) (larger is better). 3. For each of the candidate clusters, c: 3a. Split c into two clusters, c1 and c2, through the bisector of its principal component. Initialise the responsibilities q_z_n(z_n = c_1) and q_z_n(z_n = c_2). 3b. Update only the parameters of c1 and c2 using the observations that belonged to c, and determine the new value for the free energy, FT+1. 3c. Reassign cluster labels so that cluster 1 corresponds to the largest cluster, cluster 2 to the second largest, and so on. 4. Select the split that lead to the maximal reduction of free energy, FT+1. 5. Update the posterior using the newly split data. 6. If FT - FT+1 < \epsilon then halt, else set T := T +1 and go to step 2.

The loop is implemented in the function greedy(...)

Value

  prior  Prior parameters of the vdp-gm model (qofz: priors on observation lables; Mu: centroids; S2: variance).  posterior  Posterior estimates for the model parameters and statistics.  weights  Mixture proportions, or weights, for the Gaussian mixture components.  centroids  Centroids of the mixture components.  sds  Standard deviations for the mixture model components (posterior modes of the covariance diagonals square root). Calculated as sqrt(invgam.scale/(invgam.shape + 1)).  qOFz  Sample-to-cluster assigments (soft probabilistic associations).  Nc  Component sizes  invgam.shape  Shape parameter (alpha) of the inverse Gamma distribution  invgam.scale  Scale parameter (beta) of the inverse Gamma distribution  Nparams  Number of model parameters  K  Number of components in the mixture model  opts  Model parameters that were used.  free.energy  Free energy of the model.

Note

This implementation is based on the Variational Dirichlet Process Gaussian Mixture Model implementation, Copyright (C) 2007 Kenichi Kurihara (all rights reserved) and the Agglomerative Independent Variable Group Analysis package (in Matlab): Copyright (C) 2001-2007 Esa Alhoniemi, Antti Honkela, Krista Lagus, Jeremias Seppa, Harri Valpola, and Paul Wagner.

Author(s)

Maintainer: Leo Lahti leo.lahti@iki.fi

References

Kenichi Kurihara, Max Welling and Nikos Vlassis: Accelerated Variational Dirichlet Process Mixtures. In B. Sch\'olkopf and J. Platt and T. Hoffman (eds.), Advances in Neural Information Processing Systems 19, 761–768. MIT Press, Cambridge, MA 2007.

Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  set.seed(123) # Generate toy data with two Gaussian components dat <- rbind(array(rnorm(400), dim = c(200,2)) + 5, array(rnorm(400), dim = c(200,2))) # Infinite Gaussian mixture model with # Variational Dirichlet Process approximation mixt <- vdp.mixt( dat ) # Centroids of the detected Gaussian components mixt$posterior$centroids # Hard mixture component assignments for the samples apply(mixt$posterior$qOFz, 1, which.max) 

netresponse documentation built on Nov. 8, 2020, 5:04 p.m.