knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
While the relabeling algorithm proposed in Stephens (2000) has been shown to perform well in dealing with the label switching phenomenon in Bayesian finite mixture models and has the attractive feature of requiring no additional information beyond the posterior samples, it has been pointed out that it is computationally expensive. Currently, the label.switching::stephens
function provides the only R
implementation of Stephens' method. Yet, the function is rather slow, which makes it impractical to use on MCMC output of moderate size.
The relabelKL
package is a C++
implementation of Stephen's relabeling algorithm. It is implemented using the Armadillo library and sourced via the RcppArmadillo
package. If available, the functions provided will use OpenMP to run parts of the code in parallel. The package provides also a set of helper functions to facilitate the integration with stanfit
objects produced via the rstan
package.
A comparison between the performance of the label.switching::stephens
function and the relabelKL::relabelMCMC
function is provided at the end of this document.
Using the devtools
package, the developmental version of relabelKL
can be installed from github:
devtools::install_github("baruuum/relabelKL") library(relabelKL)
Mac OS users might experience compiling issues due to the OpenMP dependency. To set up the machine and install the necessary compiling tools, follow the instructions in this page.
Relabeling MCMC output is done by the relabelMCMC
function. The function takes as input an array of dimensions S
*N
*K
, where S
is the number of posterior draws, N
the number of units/individuals, K
the latent classes/categories/types to which they belong, and the entries are either probabilities or log-probabilities of unit i = 1,2,...,N
belonging to one of the k=1,2,..,K
classes/types.
Three options might be specified:
log_p
: whether the entries are log-probabilities or probabilities (defaults to log_p = TRUE
)renormalize
: if TRUE
renormalizes the rows of each slice of x
to sum to onemaxit
: the number of maximum iterations (defaults to maxit = 100
)nthreads
: number of threads to use in calculation (if OPENMP is enabled)verbose
: whether intermediate results should be printed (defaults to verbose = TRUE
); andCalling relabelMCMC
on the array x
res = relabelMCMC(x)
returns a list
of four elements:
permuted
: the same object as x
but with the columns of each posterior draw relabeledperms
: a S \times K
matrix which contains, for each draw, the optimal permutation. That is, the s
th row of perms
shows how the s
th posterior draw was permuted, so that applying the permutation in perms
to the initial array, x
, will result in the relabeled arrayiterations
: the number of iterations for which the algorithm was runstatus
: which is 0
if the algorithm has successfully converged and 1
otherwiseThere are often other parameters in the model that depend on the labeling of the latent classes / extreme types. The permuteMCMC
function can be used in this situation. For an three-dimensional array, y
, where the last dimension correspond to S
posterior draws, the permuteMCMC
can be called based on the results from the relabelMCMC
output (or for any other relabeling algorithm that returns the permutation mapping). Calling, for example,
y_relabeled = permuteMCMC(y, perms = res$perms, what = "dimension to permute")
will permute the either the rows, columns, or both the rows and columns of each of the S
sub-arrays of y
based on res$perms
.
what
option: specifying "rows
" will permute the rows of each s=1,2,...,S
draw, specifying "cols"
will permute the columns, and "both"
will permute both the rows and columns.permuteMCMC
function assumes that the first index of a three dimensional array represents the posterior draws, so that entering an array of dimensions, say, N
*B
*S
will lead to undefined behavior. permuteMCMC
, it is assumed that it has dimensions S
*K
and, the function will always permute the columns. If the "true" labels of a stochastic blockmodel or latent class model are known in advance, assigning each individual to their true class is straightforward. Yet, there are situations in which we want to make the assignment probabilities of each posterior draw as close as possible to a set of known/true probabilities. These situations arise, for example, when MAP or MLE estimates of the membership vectors of mixed membership models are calculated and when we want to use them as a pivots to relable the MCMC samples. In these cases, the relabelTRUE
function might be used as follows:
rel_true = relabelTRUE(x = x, x_true = x_true, verbose = F, log_p = F)
where x_true
is a N
*K
matrix of "true" assignment probabilities / mixed membership vectors. This function will try to relabel each posterior draw in x
as close as possible to x_true
in terms of KL-distances. When the log_p
option is set to TRUE
, both x
and x_true
should be entered on the log-scale. Otherwise, the function will throw an error.
rstan
objectsThe package comes with one dataset called mmsbm
, which is a stanfit
object obtained from running a mixed membership stochastic blockmodel on simulated data. The model has two parameters: pi
, a matrix the mixed membership vectors, where each row indicates the probability of individual i = 1,2,..,N
enacting type k = 1, 2, ..., K
in the interaction process, and theta
the so-called image matrix, which is of dimensions K
*K
and reflects the association tendencies between the K
pure types.
The data can be loaded as
library(relabelKL) data(mmsbm)
The relabelKL
package provides a wrapper function to extract posterior samples from a stanfit
object and rearrange the parameter draws so that it can be directly passed to the relabelMCMC
function.
Calling
pi_arr = extract_n_combine(mmsbm, par = "pi")
will combine post-warmup draws across all MCMC chains into the first dimension of a three-dimensional array, where the other dimensions are identical to the dimensions specified in the Stan
program. For example, if pi
was specified as a matrix[L,M]
object, the first dimension of pi_arr
will be of order S*J
, where S
is as before the number of post-warmup draws and J
the number of MCMC chains, and the two last dimensions of the extracted object will be L
and M
.
For the mmsbm
dataset, calling extract_n_combine
results in an array of dimensions
dim(pi_arr)
This object, then, can be passed to relabelMCMC
or permuteMCMC
as follows:
rel = relabelMCMC(pi_arr, maxit = 50L, verbose = TRUE, log_p = FALSE)
The rel$permuted
will contain the relabeled pi
array, and we can apply the same permutations on other parameters by using the rel$perms
object, which contains the permutations applied to the original MCMC output:
# get permuted labels rel_pi_arr = rel$permuted # extract image matrix from stanfit object theta_arr = extract_n_combine(mmsbm, par = "theta") # apply same permutation to theta rel_theta_arr = permuteMCMC(theta_arr, rel$perms, "both")
Lastly, to monitor the convergence of the relabeled posterior draws, we have to transform the relabeled arrays into a form that can be passed to the rstan::monitor
function. This can be done by using the to_stan_array
function as follows:
# monitor convergence of original output rstan::monitor(to_stan_array(theta_arr)) # same statistics for the relabeled output rstan::monitor(to_stan_array(rel_theta_arr))
Similarly, traceplot
might be inspected by using
array_traceplot(to_stan_array(rel_theta_arr), par = "theta")
label.switching::stephens
A simple comparison with the stephens
function of the label.switching
package might offer some insights into the efficiency gains. We first draw some random vectors that sum to one using the Dirichlet distribution.
# set seed and dimensions set.seed(123) x = extract_n_combine(mmsbm, "pi")
Now, rigorous benchmarking will be super time-consuming; but as the difference in speed are quite large, we might use the Sys.time()
function to get a sense of the efficiency gain. First, we use the stephens
function of the label.switching
package:
# fit label.switching::stephens start_t = Sys.time() res1 = label.switching::stephens(x) end_t = Sys.time() print(end_t - start_t)
Next, we use the relabelMCMC
function:
# fit relabelKL::relabelMCMC start_t = Sys.time() res2 = relabelKL::relabelMCMC(x, maxit = 100L, verbose = FALSE, log_p = FALSE) end_t = Sys.time() print(end_t - start_t)
and with multiple threads
# fit relabelKL::relabelMCMC start_t = Sys.time() res3 = relabelKL::relabelMCMC(x, maxit = 100L, verbose = FALSE, nthreads = 2L, log_p = FALSE) end_t = Sys.time() print(end_t - start_t)
Lastly, we check can check that the results are indentical:
sum(res1$permutations != res2$perms) sum(res1$permutations != res2$perms)
Stephens, M. 2000. "Dealing with label Switching in mixture models," Journal of the Royal Statistical Society Series B, 62, 795-809.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.