sampleLCAMixture: Telescoping sampling of the mixture of LCA models where a...

View source: R/sampleLCAMixture.R

sampleLCAMixtureR Documentation

Telescoping sampling of the mixture of LCA models where a prior on the number of components K is specified.

Description

  • The MCMC scheme is implemented as suggested in Malsiner-Walli et al (2024).

  • Also the priors on the model parameters are specified as in Malsiner-Walli et al (2024), see the vignette for details and notation.

Usage

sampleLCAMixture(
  y,
  S,
  L,
  pi,
  eta,
  mu,
  phi,
  a_00,
  a_mu,
  a_phi,
  b_phi,
  c_phi,
  d_phi,
  M,
  burnin,
  thin,
  Kmax,
  s_mu,
  s_phi,
  eps,
  G,
  priorOnWeights,
  d0,
  priorOnK,
  verbose = FALSE
)

Arguments

y

A numeric matrix; containing the data where categories are coded with numbers.

S

A numeric matrix; containing the initial cluster assignments.

L

A numeric scalar; specifiying the number of classes within each component.

pi

A numeric matrix; containing the initial class-specific occurrence probabilities.

eta

A numeric vector; containing the initial cluster sizes.

mu

A numeric matrix; containing the initial central component occurrence probabilities.

phi

A numeric matrix; containing the initial component- and variable-specific precisions.

a_00

A numeric scalar; specifying the prior parameter a_00.

a_mu

A numeric vector; containing the prior parameter a_mu.

a_phi

A numeric vector; containing the prior parameter a_phi for each variable.

b_phi

A numeric vector; containing the initial value of b_phi for each variable.

c_phi

A numeric vector; containing the prior parameter c_phi for each variable.

d_phi

A numeric vector; containing the prior parameter d_phi for each variable.

M

A numeric scalar; specifying the number of recorded iterations.

burnin

A numeric scalar; specifying the number of burn-in iterations.

thin

A numeric scalar; specifying the thinning used for the iterations.

Kmax

A numeric scalar; the maximum number of components.

s_mu

A numeric scalar; specifying the standard deviation of the proposal in the Metropolis-Hastings step when sampling mu.

s_phi

A numeric scalar; specifying the standard deviation of the proposal in the Metropolis-Hastings step when sampling phi.

eps

A numeric scalar; a regularizing constant to bound the Dirichlet proposal away from the boundary in the Metropolis-Hastings step when sampling mu.

G

A character string; either "MixDynamic" or "MixStatic".

priorOnWeights

A named list; providing the prior on the mixture weights.

d0

A numeric scalar; containing the Dirichlet prior parameter on the class weights.

priorOnK

A named list; providing the prior on the number of components K, see priorOnK_spec().

verbose

A logical; indicating if some intermediate clustering results should be printed.

Value

A named list containing:

  • "Eta": sampled weights.

  • "S": sampled assignments.

  • "K": sampled number of components.

  • "Kplus": number of filled, i.e., non-empty components, for each iteration.

  • "Nk": number of observations assigned to the different components, for each iteration.

  • "Nl": number of observations assigned to the different classes within the components, for each iteration.

  • "e0": sampled Dirichlet parameter of the prior on the weights (if e_0 is random).

  • "alpha": sampled Dirichlet parameter of the prior on the weights (if \alpha is random).

  • "acc": logical vector indicating acceptance in the Metropolis-Hastings step when sampling either e_0 or \alpha.

  • "Mu": sampled central component occurrence probabilities.

  • "Phi": sampled precisions.

  • "acc_mu": the acceptance rate in the Metropolis-Hastings step when sampling \mu_{k,j}.

  • "acc_phi": the acceptance rate in the Metropolis-Hastings step when sampling \phi_{k,j}.

  • "nonnormpost_mode": parameter values corresponding to the mode of the nonnormalized posterior.

  • "Pi_k": sampled weighted component occurrence probabilities.

Examples

data("SimData", package = "telescope")
y <- as.matrix(SimData[, 1:30])
z <- SimData[, 31]
N <- nrow(y)
r <- ncol(y)
    
M <- 5
thin <- 1
burnin <- 0
Kmax <- 50  
Kinit <- 10
    
G <- "MixDynamic"
priorOnAlpha <- priorOnAlpha_spec("gam_1_2")
priorOnK <- priorOnK_spec("Pois_1")
d0 <- 1  

cat <- apply(y, 2, max)
a_mu <- rep(20, sum(cat))
mu_0 <- matrix(rep(rep(1/cat, cat), Kinit),
  byrow = TRUE, nrow = Kinit)

c_phi <- 30; d_phi <- 1; b_phi <- rep(10, r)
a_phi <- rep(1, r)
phi_0 <- matrix(cat, Kinit, r, byrow = TRUE)

a_00 <- 0.05

s_mu <- 2; s_phi <- 2; eps <- 0.01 

set.seed(1234)
cl_y <- kmeans(y, centers = Kinit, nstart = 100, iter.max = 50)
S_0 <- cl_y$cluster
eta_0 <- cl_y$size/N

I_0 <- rep(1L, N)
L <- 2
for (k in 1:Kinit) {
  cl_size <- sum(S_0 == k)
  I_0[S_0 == k] <- rep(1:L, length.out = cl_size)
}

index <- c(0, cumsum(cat))
low <- (index + 1)[-length(index)]
up <- index[-1]

pi_km <- array(NA_real_, dim = c(Kinit, L, sum(cat)))
rownames(pi_km) <- paste0("k_", 1:Kinit)
for (k in 1:Kinit) {
  for (l in 1:L) {
    index <- (S_0 == k) & (I_0 == l)
    for (j in 1:r) {
      pi_km[k, l, low[j]:up[j]] <- tabulate(y[index, j], cat[j]) / sum(index)
    }
  }
}
pi_0 <- pi_km 

result <- sampleLCAMixture(
    y, S_0, L,
    pi_0, eta_0, mu_0, phi_0,
    a_00, a_mu, a_phi, b_phi, c_phi, d_phi,
    M, burnin, thin, Kmax,
    s_mu, s_phi, eps,
    G, priorOnAlpha, d0, priorOnK)

telescope documentation built on April 4, 2025, 2:40 a.m.