knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(VIBER)
library(tidyverse)

To run the fit you need 2 tibbles (or matrices), with equal dimensions and matched column names. One tibble contains the counts of the successful Bernoulli trials, the other contains the count of all the attempted trials.

In the package we provide mvbmm_example, a dataset which shows the input format.

data("mvbmm_example")

The format is (S1 and S2 are the dimensions)

mvbmm_example$successes

Fitting the model

You can fit the model with the variational_fit function, which is quite well documented

??variational_fit

You have control over several parameters. Concerning the mixture type you can set the following

Concerning the variational optimization you can set the following

The fitting engine makes use of the easypar package to run in parallel the required number of fits. The default is multi-core implementation that uses 80% of the available cores; you can disable parallel runs and execute sequential turning off easypar, as explained in its Wiki.

We run the fit with default parameters, the output model is print to screen.

fit = variational_fit(
  mvbmm_example$successes,
  mvbmm_example$trials
)

Filtering output clusters

Because the model is semi-parametric, it will attempt to use at most K Binomial clusters. However, most of those clusters might be not interesting and you might want to filter them.

VIBER implements 2 possible filters which are available in function choose_clusters.

Adter filtering, output clusters will be renamed by size (C1 will be larger etc.), and the latent variables and hard clustering assignments will be updated accordingly.

Here we require only the minimum cluster size to be 2% of the total number of points.

fit = choose_clusters(fit, 
                      binomial_cutoff = 0, 
                      dimensions_cutoff = 0,
                      pi_cutoff = 0.02)

The new model has fewer clusters

fit 

Renaming output clusters

Fit clusters can be renamed upon defining a named vector whose values are the newly desired names, and whose names are the names to change. This can help if certain clusters can be given a certain interpretation which we want to communicate through plots.

# rename 6 clusters as new_C_1, new_C_2, ....
new_labels = paste0("new_C_", 1:6)
names(new_labels) = paste0("C", 1:6)

print(new_labels)

# renaming
fit_renamed = rename_clusters(fit, new_labels)
print(fit_renamed)

In what follows, we use the original names.

Plots

Clustering assignments and latent variables statistics

You can plot the data - one dimension against the other - with the plot_2D (for instance, trye plot_2D(fit, d1 = 'S1', d2 = 'S2')), or use the S3 function plot(fit) to compute a list of plots for each pair of dimensions in the mixture.

plot(fit)

You can plot the mixing proportions of the mixture

plot_mixing_proportions(fit)

And, finally, you can plot the latent variables of the mixture

plot_latent_variables(fit)

Note the plot of a renamed object

plot(fit_renamed)

Evidence Lower Bound (ELBO)

You can plot the ELBO

plot_ELBO(fit)

Binomial parameters

You can plot the Binomial peaks, per cluster and per dimension

plot_peaks(fit)


caravagn/VIBER documentation built on July 16, 2022, 1:23 a.m.