build_lcm: Build a Latent Class Model

View source: R/build_lcm.R

build_lcmR Documentation

Build a Latent Class Model

Description

Function build_lcm is a shortcut for constructing a latent class model as a restricted case of an mhmm object.

Usage

build_lcm(
  observations,
  n_clusters,
  emission_probs,
  formula = NULL,
  data = NULL,
  coefficients = NULL,
  cluster_names = NULL,
  channel_names = NULL
)

Arguments

observations

An stslist object (see TraMineR::seqdef()) containing the sequences, or a list of such objects (one for each channel).

n_clusters

A scalar giving the number of clusters/submodels (not used if starting values for model parameters are given with emission_probs).

emission_probs

A matrix containing emission probabilities for each class by rows, or in case of multichannel data a list of such matrices. Note that the matrices must have dimensions k x s where k is the number of latent classes and s is the number of unique symbols (observed states) in the data. Emission probabilities should follow the ordering of the alphabet of observations (alphabet(observations), returned as symbol_names).

formula

Optional formula of class formula() for the mixture probabilities. Left side omitted.

data

A data frame containing the variables used in the formula. Ignored if no formula is provided.

coefficients

An optional k x l matrix of regression coefficients for time-constant covariates for mixture probabilities, where l is the number of clusters and k is the number of covariates. A logit-link is used for mixture probabilities. The first column is set to zero.

cluster_names

A vector of optional names for the classes/clusters.

channel_names

A vector of optional names for the channels.

Value

Object of class mhmm with the following elements:

  • observations
    State sequence object or a list of such containing the data.

  • transition_probs
    A matrix of transition probabilities.

  • emission_probs
    A matrix or a list of matrices of emission probabilities.

  • initial_probs
    A vector of initial probabilities.

  • coefficients
    A matrix of parameter coefficients for covariates (covariates in rows, clusters in columns).

  • X
    Covariate values for each subject.

  • cluster_names
    Names for clusters.

  • state_names
    Names for hidden states.

  • symbol_names
    Names for observed states.

  • channel_names
    Names for channels of sequence data

  • length_of_sequences
    (Maximum) length of sequences.

  • sequence_lengths
    A vector of sequence lengths.

  • n_sequences
    Number of sequences.

  • n_symbols
    Number of observed states (in each channel).

  • n_states
    Number of hidden states.

  • n_channels
    Number of channels.

  • n_covariates
    Number of covariates.

  • n_clusters
    Number of clusters.

See Also

fit_model() for estimating model parameters; summary.mhmm() for a summary of a mixture model; separate_mhmm() for organizing an mhmm object into a list of separate hmm objects; and plot.mhmm() for plotting mixture models.

Examples

# Simulate observations from two classes
set.seed(123)
obs <- seqdef(rbind(
  matrix(sample(letters[1:3], 500, TRUE, prob = c(0.1, 0.6, 0.3)), 50, 10),
  matrix(sample(letters[1:3], 200, TRUE, prob = c(0.4, 0.4, 0.2)), 20, 10)
))

# Initialize the model
set.seed(9087)
model <- build_lcm(obs, n_clusters = 2)

# Estimate model parameters
fit <- fit_model(model)

# How many of the observations were correctly classified:
sum(summary(fit$model)$most_probable_cluster == rep(c("Class 2", "Class 1"), 
  times = c(500, 200)))

############################################################
## Not run: 
# LCM for longitudinal data

# Define sequence data
data("mvad", package = "TraMineR")
mvad_alphabet <- c(
  "employment", "FE", "HE", "joblessness", "school",
  "training"
)
mvad_labels <- c(
  "employment", "further education", "higher education",
  "joblessness", "school", "training"
)
mvad_scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad_seq <- seqdef(mvad, 15:86,
  alphabet = mvad_alphabet, states = mvad_scodes,
  labels = mvad_labels, xtstep = 6
)

# Initialize the LCM with random starting values
set.seed(7654)
init_lcm_mvad1 <- build_lcm(
  observations = mvad_seq,
  n_clusters = 2, formula = ~male, data = mvad
)

# Own starting values for emission probabilities
emiss <- rbind(rep(1 / 6, 6), rep(1 / 6, 6))

# LCM with own starting values
init_lcm_mvad2 <- build_lcm(
  observations = mvad_seq,
  emission_probs = emiss, formula = ~male, data = mvad
)

# Estimate model parameters (EM algorithm with random restarts)
lcm_mvad <- fit_model(init_lcm_mvad1,
  control_em = list(restart = list(times = 5))
)$model

# Plot the LCM
plot(lcm_mvad, interactive = FALSE, ncol = 2)

###################################################################

# Binomial regression (comparison to glm)

require("MASS")
data("birthwt")

model <- build_lcm(
  observations = seqdef(birthwt$low), emission_probs = diag(2),
  formula = ~ age + lwt + smoke + ht, data = birthwt
)
fit <- fit_model(model)
summary(fit$model)
summary(glm(low ~ age + lwt + smoke + ht, binomial, data = birthwt))


# Multinomial regression (comparison to multinom)

require("nnet")

set.seed(123)
n <- 100
X <- cbind(1, x1 = runif(n, 0, 1), x2 = runif(n, 0, 1))
coefs <- cbind(0, c(-2, 5, -2), c(0, -2, 2))
pr <- exp(X %*% coefs) + stats::rnorm(n * 3)
pr <- pr / rowSums(pr)
y <- apply(pr, 1, which.max)
table(y)

model <- build_lcm(
  observations = seqdef(y), emission_probs = diag(3),
  formula = ~ x1 + x2, data = data.frame(X[, -1])
)
fit <- fit_model(model)
summary(fit$model)
summary(multinom(y ~ x1 + x2, data = data.frame(X[, -1])))

## End(Not run)

seqHMM documentation built on June 8, 2025, 10:16 a.m.