knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
CLCM with Multinomial Latent Regression (4 latent classes)
Longitudinal CLCM & Transition Matrix
This is an intro to the CLCM R package, which provides a variety of options for fitting confimatory latent class models to data types commonly occurring in clinical data. The novel methodological contribution to the field of statistics/psychometrics is the extension of models from Educational Measurement literature, which typically deals with binary data, to data distributions commonly encountered in clinical datasets.
This first or intro vignette will illustrate some of the capabilities of the package. Data will be simulated using K=1, or 1 factor. This will yield 2 latent classes. All 8 item types available will be simulated for this first example. Then a model will be fit to the data, and the AIC/BIC will be computed. Other vignettes will demonstrate the use of the C2 fit statistic and the transition matrix. The latter is used when data is longitudinal.
NA
.library(CLCM) library(ggplot2) N <- 500 number.timepoints <- 1 item.type <- c('Ordinal', 'Nominal', 'Poisson', 'Neg_Binom', 'ZINB', 'ZIP', 'Normal', 'Beta') sim.categories.j <- c(4, 4, 30, 30, 30, 30, NA, NA) lc.prop <- list('Time_1' = c(0.5, 0.5), 'Time_2' = c(0.5, 0.5) ) Q <- matrix(1, nrow = length(item.type), ncol = 1, dimnames = list(paste0('Item_', 1:length(item.type)), 'F1')) Q
Notice that the Q-matrix has 1 factor/attribute (K=1), which yields 2 latent classes. The simulation function has a default of 2 latent classes, so if you ignore the function call entirely, the function will create this Q-matrix automatically.
Now we have what we need to pass to the function used to simulate the data.
set.seed(03102021) sim.dat <- simulate_clcm(N = N, Q = Q, number.timepoints = number.timepoints, item.type = item.type, categories.j = sim.categories.j, lc.prop = lc.prop)
This yields the simulated item responses. From here we can proceed to estimate the model.
Note the option verbose = F
can be used to stop the function from printing estimation progress.
mod <- clcm(dat = sim.dat$dat, item.type = sim.dat$item.type, item.names = sim.dat$item.names, Q = sim.dat$Q)
This yields the estimated confirmatory latent class model.
Next let's exmaine model fit. Unfortunately, we cannot use the C2 statistic to evaluate absolute model fit due to the item types used here. The C2 statistic can only be used with all ordinal items. In this case, we can only compute the AIC/BIC. Of course, these are relative fit statistics so they aren't very useful without a comparison. Regardless, we can compute it to illustrate that it is available.
aic_bic_clcm(mod = mod)
Next, let's examine classification accuracy. We can compare the true latent class assignment (lca) with the estimated lca. This is performed next. Note that the model output contains the true_lca
because the dataframe of simulated data (sim.dat
) contained that variable.
lca.hat <- mod$dat$lca lca.true <- mod$dat$true_lca prop.table(table(lca.true == lca.hat)) xtabs( ~ lca.true + lca.hat)
Looks pretty, preettty, prettttty good.
Next let's visualize how the item responses vary across the two latent classes.
library(ggplot2) dat.plot <- mod$dat dat.plot$lca <- factor(dat.plot$lca, levels = sort(unique(dat.plot$lca)), labels = paste0('Class ', sort(unique(dat.plot$lca)) )) item.names <- mod$item.names for(j in 1:length(item.names) ){ if(item.type[j] %in% c('Ordinal', 'Nominal')){ pp <- ggplot(data = dat.plot, aes(fill=lca, x= as.factor(get(item.names[j])) )) + geom_bar(aes(y = (..count..)/sum(..count..)), color="#e9ecef", alpha=0.6, position="dodge", stat="count") + scale_y_continuous(labels=scales::percent_format(accuracy = 1)) + scale_fill_manual(name = 'Latent Classes', values=c("blue2", "red2")) + theme_minimal() + theme(legend.position = 'bottom') + labs(x = 'Item Responses', y = 'Percentage', title = paste0('Item Name: ', item.names[j]), subtitle = paste0('Item Type: ', item.type[j]), caption = 'Note: Simulated data') suppressMessages(plot(pp)) }# Ordinal if(item.type[j] %in% c('Poisson', 'Neg_Binom', 'ZINB', 'ZIP', 'Normal', 'Beta')){ pp <- ggplot(data = dat.plot, aes(x = get(item.names[j]), group = lca, fill = lca)) + geom_histogram(aes(y = (..count..)/sum(..count..)), color="#e9ecef", alpha=0.6, position = 'identity') + scale_y_continuous(labels=scales::percent_format(accuracy = 1)) + scale_fill_manual(name = 'Latent Classes', values=c("blue2", "red2")) + theme_minimal() + theme(legend.position = 'bottom') + labs(x = 'Item Responses', y = 'Percentage', title = paste0('Item Name: ', item.names[j]), subtitle = paste0('Item Type: ', item.type[j]), caption = 'Note: Simulated data') suppressMessages(plot(pp)) }# Count items }# end loop over items
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.