In nguyenllpsych/thirt: Thurstonian FC-IRT Model

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

thirt

An R Packages for the Thurstonian Item Response Theory model

The thirt package contains functions designed to simulation data that for forced-choice questionnaires under the Thurstonian Item Response Theory model and to estimate person- and item-parameters using an MCMC algorithm.\

The package homepage is currently hosted here.

Installation

The package is hosted on GitHub and the latest development version can be installed by running this code within an R session:

devtools::install_github("nguyenllpsych/thirt")

Help

Take a look at the help pages for simulate_thirt_params, simulate_thirt_resp, and estimate_thirt_params_mcmc for more information.

Example

Parameter and Response Simulation

We can first simulate the person-parameter $\theta$ and item-parameters $\lambda, \gamma, \psi^2$ given a specific test design. The test conditions that can be adjusted are: the number of respondents, the number of items per question block, the number of negatively keyed items per block (i.e., items with negative loadings $\lambda$), the number of question blocks, and the number of dimensions or traits that are being measured.

# load the package
library(thirt)

# set seed for reproducibility
set.seed(2022)
options(digits = 2)

# specify test conditions
params <- simulate_thirt_params(
  n_person = 2,  # number of respondents
  n_item   = 3,  # number of items per block
  n_neg    = 1,  # number of negatively keyed items per block
  n_block  = 2,  # number of question blocks
  n_dim    = 3)  # number of dimensions or traits being measured

Due to the differing structures across parameters, the simulate_thirt_params() function generates three separate dataframes:

gamma dataframe for $\gamma$: each row corresponds to an item pair within a block. For instance row 4 below shows the item threshold parameter between the first two items of the second question block.
items dataframe for $\lambda$ and $\psi^2$: each row corresponds to an individual item within a block. For instance, row 1 below shows the item loading and unique variance for the first item in the first block, which measures the second dimension.
persons dataframe for $\theta$: each row corresponds to an individual respondent. For instance, row 1 below shows the first respondent's latent trait scores across the three traits measured

# three output dataframes for parameters
params

After we have simulated the person- and item-parameters, we can now simulate a random response pattern based on these values. The simulate_thirt_resp() function takes in three input values: the three dataframes in the same structure as gamma, items, persons.

# simulate response patterns based on previously generated parameters
resp <- do.call(simulate_thirt_resp, params)

This will generate two separate dataframes to represent the response patterns:

items dataframe: each row represents one item within a block. For instance, row 1 shows that the first item in the first block is used to measure the second dimension and that it is positively keyed (i.e., with a positive item loading $\lambda$
resp dataframe: each row represents one respondent within a block. For instance, row 1 shows that, for the first question block, the first respondent ranks the second item the highest, then the third item, then the first item. This response sequence is internally coded as the fourth possible response pattern for a three-item question block. This numbering system is for internal data storing purposes and has no substantive implications.

# two output dataframes for response patterns
resp

In order to generate the indices for the resp column, please run mupp::find_all_permutations(n = <number_of_items_per_block>, init = 1). For user convenience, a list of indices for a 3-item and 4-item block is shown below:

# 3-item block
triplets <- as.data.frame(mupp::find_all_permutations(n = 3, init = 1))
triplets$seq <- paste0(triplets$V1, ", ", triplets$V2, ", ", triplets$V3)
triplets <- data.frame(
  resp = 1:nrow(triplets),
  seq  = triplets$seq)
knitr::kable(triplets)

# 4-item block
quads <- as.data.frame(mupp::find_all_permutations(n = 4, init = 1))
quads$seq <- paste0(quads$V1, ", ", quads$V2, ", ", quads$V3, ", ", quads$V4)
quads <- data.frame(
  resp = 1:nrow(quads),
  seq  = quads$seq)
knitr::kable(quads)

Parameter Estimation

Using the previously simulated response patterns, we can now apply the main estimation function to estimate all person- and item-parameters. In order to apply this function on real data, researchers must structure their data to follow the template of the two resp and items dataframes.

In addition to these two required inputs, we may change the control argument to adjust the MCMC algorithm:

n_iter: the number of MCMC iterations, exclude for default of 10000
n_burnin: the number of burn-in MCMC iterations, exclude for default of 1000
step_size_sd: the step size standard deviation for the random-walk Metropolis step, exclude for default of 0.1 across parameters

# estimate person- and item-parameters based on previously generated response patterns
estimations <- estimate_thirt_params_mcmc(
  resp = resp$resp,
  items = resp$items,
  control = list(n_iter = 100, # number of iterations
                 n_burnin = 20)) # number of burn-ins

The estimated parameter values are computed as the mean across all iterations after the burn-in period, and they may be accessed by calling the mean_mcmc list object stored within the output of estimate_thirt_params_mcmc(). There are four separate dataframes for the different person- and item-parameters. These parameter estimates can then be directly compared to the true simulated parameters that generated the model. We should note, however, that this illustrative example used a very simple data-generating model (with few people, items, and blocks) as well as a very short MCMC chain with only 100 iterations.

# view the estimated parameter values
estimations$mean_mcmc

Fixing Operational Items Parameters

Often, test developers may want to fix item parameters $\gamma$, $\lambda$, and $\psi^2$ for the operational items and calibrate only field test items. Pre-determined parameters for the operational items can be specified with the op_params argument, which defaults to NULL. This argument needs to be a list containing matrices named gamma, lambda, and psisq. Parameter values for the fixed operational items should be provided whereas those for field test items should be left as NA. The order of parameters should be the same as shown in simulate_thirt_params() outputs in the gamma and items data frame for an equivalent design.

When operational items are specified, they alone are used for trait scoring. Test items do not affect trait scores.

In the example below, we consider items 1 and 2 to be operational items. This means we would fix $\lambda$ and $\psi^2$ for these individual items and $\gamma$ for the pair 1-2.

Trait scores can be obtained in mean_mcmc$theta whereas item calibration results can be obtained in mean_mcmc_test$gamma, mean_mcmc_test$lambda, and mean_mcmc_test$psisq. Note that the operational item parameters are fixed.

# matrix for each params
# operational parameters
op_gamma <- matrix(
  # first pair 1-2 is fixed for each of 2 blocks
  c(0.4, NA, NA,
   -0.9, NA, NA)
  )
op_lambda <- matrix(
  # first 2 items are fixed for each of 2 blocks
  c(-0.65, 0.55, NA,
    0.57, -0.40, NA)
  )
op_psisq <- matrix(
  # first 2 items are fixed for each of 2 blocks
  c(0.89, 0.72, NA,
    0.02, 0.41, NA)
)

# rerun the estimation algorithm providing operational params
start_mcmc <- Sys.time()
estimations_op <- estimate_thirt_params_mcmc(
  resp  = resp$resp,
  items = resp$items,
  control = list(n_iter   = 500,
                 n_burnin = 20,
                 step_size_sd = c(0.5,     # theta
                                  0.01,    # gamma
                                  0.01,    # lambda
                                  0.075)), # psisq
  op_params = list(gamma  = op_gamma,
                   lambda = op_lambda,
                   psisq  = op_psisq)
)
end_mcmc   <- Sys.time()

# view the estimated parameter values - person scores
estimations_op$mean_mcmc$theta

# view the estimated parameter values - item parameters
estimations_op$mean_mcmc_test$gamma
estimations_op$mean_mcmc_test$lambda
estimations_op$mean_mcmc_test$psisq