In haziqj/diagacc: Modelling Diagnostic Accuracy using Latent Class Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README_files/README-"
)

R/diagacc: Modelling diagnostic errors using latent class models

This is the R package accompanying our paper: Evaluating diagnostic tests and quantifying prevalence for Tropical Infectious Diseases: a Paradigm of Latent Class Modelling Approaches With and Without a Gold Standard for Schistosomiasis Diagnosis. There are three models primarily used in this package: latent class model (LC), latent class with random effects model (LCRE), and a finite mixture model (FM). Details of these models and their use for modelling diagnostic errors are found in the paper.

The package includes functions to

Simulate a data set from either the LC, LCRE or FM model.
Wrapper functions to fit such models using MCMC (JAGS) and obtain estimates posterior standard deviations of sensitivities, specificities, prevalence.
Functions to perform a simulation study, as per the following regime:
- Simulate data based on a certain scenario (sensitivities, specificities, prevalance, proportion of missing gold standard, and data generating mechanism)
- Fit the data using LC, LCRE and FM models
- Repeat the above two steps using a different random seed, and results averaged The results are shown in tabular form or in a graph.

Installation and setting options

The easiest way to then install from this repo is by using the devtools package. Install this first.

install.packages("devtools")

Then, run the following code to install and attach the diagacc package.

devtools::install_github("haziqj/diagacc")
library(diagacc)

The default number of items is six (including a gold standard item), with the sensitivities and specificities as shown above. To change this, use the diagacc_opt() function.

# Change sensitivities, specificities and item names
diagacc_opt(sens = c(0.1, 0.2, 0.3), spec = c(0.1, 0.2, 0.3), 
            item.names = LETTERS[1:3])

# Restore default options
diagacc_opt(default = TRUE)

Simulating data sets

The functions to simulate a data set are gen_lc(), gen_lcre() and gen_fm().

# Sample size (n), proportion of missing gold item (miss.prop), and prevalence
# (tau), using a specific random seedmys
X <- gen_lc(n = 1000, miss.prop = 0.5, tau = 0.1, seed = 123)
head(X)

Obtain estimates

The functions for model fitting are fit_lc(), fit_lcre() and fit_fm(). The MCMC options are pretty much the default settings in rjags. These, of course, can be changed--see the help files for more information.

# Fitting a LC model using MCMC
(mod1 <- fit_lc(X))

There is also the option for raw = TRUE, which returns the actual rjags or object for further inspection or manipulation. This is especially useful for MCMC diagnostics.

# Running 8 chains in parallel 
mod2 <- fit_lcre(X, method = "MCMC", raw = TRUE, silent = TRUE, n.chains = 8, 
                 runjags.method = "parallel")
plot(mod2, plot.type = "density", layout = c(3, 2), vars = "beta")

Simulation study

To perform a simulation study, use the run_sim_par() function. For instance, consider the following scenario: sample size = 500, prevalence = 0.1, missing gold standard = 20%, and data generated from a LCRE model. We shall run this for a total of B=8 replications (just for show). The result is a table showing the estimates of the prevalance, sensitivities and specificities for all items except gold standard item, as fitted using the LC, LCRE and FM model.

(res <- run_sim_par(B = 8, n = 1000, tau = 0.1, miss.prop = 0.5,
                    data.gen = "lcre", no.cores = 8))

The run_sim_par() runs the replications concurrently in parallel across the specified number of cores of the machine. There is also a non-parallel implementation of this function called run_sim().

One can also plot the results, as follows:

plot(res)

One thing to mention is that it is possible to add more replications of a particular saved diagaccSim1 object created by run_sim() or run_sim_par() simply by running the following code:

# To run additional B = 100 simulations
run_sim(object = res, B = 100)
run_sim_par(object = res, B = 100)

It will automatically read in all the previous simulation scenarios.

Running multiple simulation studies

Often, there is an interest to run multiple simulation scenarios, for example, multiple sample sizes, prevalences, and proportion of missing gold standard. One might use the above functions in a nested for loop manually, or use the in-built run_study() and run_study_par() functions.

res <- run_study_par(B = 8, n = 250, tau = c(0.08, 0.4), miss.prop = 1.0)

The above code will run a total of six simulation scenarios (1 sample size x 2 prevalences x 1 missing gold proportions x 3 data generating mechanisms). To access any one of these scenarios, use the corresponding sim.key. Currently available methods are print() and plot().