knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE
)
options(digits = 3)

The devtools library can be used to install the TSI package directly from Github:

library(devtools)
install_github('mmansolf/TSI')
library(TSI)
library(mice)

Classical Test Theory with One Mismeasured Variable

Example data

First, let's simulate some data to demonstrate true score imputation for classical test theory! Here I'm simulating two samples of true score x, with mean difference equal to d:

#######################
#SIMULATION PARAMETERS#
set.seed(0)
n = 1000 #sample size
ratio = 0.6 #ratio of true variance to error variance; squared reliability
d = 0.5 #mean difference
mean_x = 0 #mean of true score
mean_w = 0 #mean of observed score
var_x = 1 #variance of true score
var_w = 1 #variance of observed score
#################
# SIMUALTE DATA #
x1 = rnorm(n, mean_x, sqrt(var_x))
e1 = rnorm(n, 0, sqrt(var_x * (1 - ratio) / ratio))
w1 = x1 + e1
x2 = rnorm(n, mean_x + d * sqrt(var_x), sqrt(var_x))
e2 = rnorm(n, 0, sqrt(var_x * (1 - ratio) / ratio))
w2 = x2 + e2

Before imputing, notice that we don't actually have a variable to relate x to include in an imputation model, only two groups. Let's create a dummy-coded variable y, where 0 corresponds to the first group and 1 corresponds to the second group, to create a data set to work with:

w = c(w1, w2)
y = c(rep(0, n), rep(1, n))
data_ctt = data.frame(w, y)
head(data_ctt)

You can access these data directly as the data_ctt object in the TSI package.

True score imputation using the TSI function

I'll present the imputation code, then explain the pieces:

set.seed(0)
mice_data = TSI(data_ctt,
              os_names = "w",
              score_types = "CTT",
              reliability = ratio,
              mean = 0,
              var_ts = 1,
              mice_args = list(m = 5, printFlag = F))

mice_data

For classical test theory, the following are required:

Observe that a new variable, true_w, was generated as part of this call. This variable contains the imputed true score values. We can quickly look at the first imputation using the complete function in the mice package:

head(complete(mice_data))

Trying it yourself

This example is one of the examples available when running example("TSI")

A couple other options

If true score imputation is required for multiple scores, all such variables must be specified in os_names. If these variables share imputation characteristics (score_types, reliability, mean, and/or var_ts), a single value can be specified for each, and this input is recycled for all observed scores specified in os_names.

Instead of specifying mean and var_ts, another option is to specify the metric of the true score if it is a z score (mean = 0, SD = 1; metric = "z"), T-score (mean = 50, SD = 10; metric = "T"), or standard score (mean = 100, SD = 15, metrics = "standard"). Here's an example, using the z score metric because the underlying x has a mean of 0 and SD of 1:

set.seed(0)
mice_data = TSI(data_ctt,
              os_names = "w",
              score_types = "CTT",
              reliability = ratio,
              metrics = "z",
              mice_args = list(m = 5, printFlag = F))

mice_data

Analyzing the imputed data

Once data are generated using TSI, the resulting r class(mice_data) object can be analyzed using multiple imputation techniques. Here, we can use the with functionality of mice to predict x from y:

pool(with(mice_data, lm(true_w~y)))

Comparing this result to that from using the observed score w:

pool(with(mice_data, lm(w~y)))

The results are roughly the same, because under classical test theory, the slope relating the mismeasured x from the measured y is unaffected by the measurement error. One statistic that is affected is the standard deviation of the score. To obtain this, we obtain a list of completed data sets mice_data by using the complete function with "all" as the second input,

mice_data = complete(mice_data, "all")

calculate the standard deviation of each column in each imputed data set, stored as sds,

sds = sapply(mice_data, function(d)apply(d, 2, sd))

and calculate the mean standard deviation across imputations:

apply(sds, 1, mean)

The standard deviation of w is inflated compared to that of the underlying true score, while the standard deviation of the imputed true_w is unbiased.

Expected a Posteriori Scoring with Two Mismeasured Variables

Example data

Because true score imputation can be used in the mice package, it can be used in tandem with conventional multiple imputation. To illustrate, I simulated two uncorrelated, normally-distributed variables, x and m, and used them to generate a new variable y according to a simple regression model:

set.seed(1)
x = rnorm(n, 0, 1)
m = rnorm(n, 0, 1)
y = 1 + 0.4 * x + 0.6 * m + rnorm(n, 0, 1)

Thus, y is related to m and w by the equation y = 0.4x + 0.6m + e, where e~N(0, 1), in a prototypical multiple regression configuration.

Next, I used calibrated item parameters from the NIH Toolbox emotion battery norming study (Kupst et al., 2015) to simulate item responses on the 10-item Perceived Stress Scale for each of x and y. I then scored these item responses using the same item parameters, yielding a set of T-scores and standard errors (SE's) for each record. These two quantities are provided routinely by the HealthMeasures scoring service, and while the standard errors are more often used in assessing individual change, true score imputation allows them to be used in statistical analysis of data sets with multiple observarions.

Lastly, I used a missing completely at random (MCAR) missing data model to set approximately 10% of the values of m and the T-score/SE pairs to missing. This "amputation" was conducted separately for m, for the T-score/SE pairs for x, and for the T-score/SE pairs for y.

We can access the simulated data using the data_eap data set in the TSI package.

head(data_eap)

True score imputation using the TSI function

The code for imputation with two EAP-scored variables is nearly identical to that above for CTT:

set.seed(0)
mice_data = TSI(data_eap,
              os_names = c("Fx", "Fy"),
              se_names = c("SE.Fx", "SE.Fy"),
              metrics = "T",
              score_types = "EAP",
              separated = T,
              ts_names = c("Tx", "Ty"),
              mice_args = c(m = 5, maxit = 5, printFlag = F))

mice_data

head(complete(mice_data))

A few differences:

Note that although there are two variables measured with error (Fx and Fy), only one value is provided for metrics, score_types, and separated, resulting in these settings being applied to all variables measured with error.

Analyzing the imputed data

Once data are generated using TSI, the resulting r class(mice_data) object can once again be analyzed using multiple imputation techniques. For reference, estimating this model using the true scores, with no missing data, yields:

lm(y~x + m)

Using the observed T scores yields:

pool(with(mice_data, lm(Fy~Fx + m)))

Observe that the regression coefficient associated with Fx, the T-score for x, is between 0 and 1, because both T scores are on the same metric; therefore, this estimate can be directly compared to that from the true, complete data above. In contrast, the m variable was not linearly transformed and therefore this estimated coefficient should be divided by 10 because the SD of T scores is 10. Placing this estimate back on the original metric by dividing by 10 yields r round(pool(with(mice_data, lm(Fy~Fx + m)))$pooled[3, "estimate"]/10, 3). Both regression coefficients are biased.

Finally, using true score imputation yields:

pool(with(mice_data, lm(Ty~Tx + m)))

Bias has been substantially reduced, and the fractions of missing information fmi now represent both information lost to missingness and information lost to measurement error.

Trying it yourself

This example is one of the examples available when running example("TSI")

Calling the mice Function Directly with method = "truescore"

The TSI function generates a call to the mice function, but the mice function can also be called directly to perform true score imputation. The core of the TSI package is a function named mice.impute.truescore which can be called from the mice package by setting method = "truescore" for each variable to be imputed using this method. These calls can get complicated, so for most purposes we hope that the TSI function can provide a convenient means to conduct true score imputation, but when an analyst needs to modify this call, they can do so by working with mice directly.

Classical test theory

One convenience provided by the TSI function is that it creates the true score variables automatically. When using mice directly, one must first add empty (all NA values) variables to the data set corresponding to the true scores:

data_ctt_2 = data_ctt
data_ctt_2$true_w = NA

Then, the mice function is called with method = "truescore":

set.seed(0)
mice_data = mice(data_ctt_2, m = 5,
               blocks = list("true_w"),
               method = "truescore",
               calibration = list(os_name = "w",
                                score_type = "CTT",
                                reliability = ratio,
                                mean_ts = mean_x,
                                var_ts = var_x),
               predictorMatrix = matrix(c(1, 1, 0), 1, 3, byrow = T),
               printFlag = F,
               remove.constant = F)
mice_data

A few notes:

The resulting mice_data object is the same, imputation error notwithstanding, from that created in the classical test theory example above:

pool(with(mice_data, lm(w~y)))

pool(with(mice_data, lm(true_w~y)))

mice_data = complete(mice_data, "all")
sds = sapply(mice_data, function(d)apply(d, 2, sd))
apply(sds, 1, mean)

Expected a posteriori scoring for item response theory

For EAP scores, we once again need to add empty true score variables to the data set manually:

data_eap_2 = data_eap
data_eap_2$Tx = NA #nolint
data_eap_2$Ty = NA #nolint

Then, we need a more complicated call to mice:

set.seed(0)
mice_data = mice(data_eap_2, m = 5, maxit = 5,
               method = c("pmm", "pmm", "pmm", "pmm", "pmm",
                        "truescore", "truescore"),
               blocks = list(Fx = "Fx", Fy = "Fy",
                             SE.Fx = "SE.Fx", SE.Fy = "SE.Fy", m = "m",
                           Tx = "Tx", Ty = "Ty"),
               blots = list(Tx = list(calibration = list(os_name = "Fx",
                                                   se_name = "SE.Fx",
                                                   score_type = "EAP",
                                                   mean = 50,
                                                   var_ts = 100,
                                                   separated = T)),
                          Ty = list(calibration = list(os_name = "Fy",
                                                   se_name = "SE.Fy",
                                                   score_type = "EAP",
                                                   mean = 50,
                                                   var_ts = 100,
                                                   separated = T))),
               predictorMatrix = matrix(c(0, 1, 1, 1, 1, 0, 0,
                                        1, 0, 1, 1, 1, 0, 0,
                                        1, 1, 0, 1, 1, 0, 0,
                                        1, 1, 1, 0, 1, 0, 0,
                                        1, 1, 1, 1, 0, 0, 0,
                                        1, 1, 1, 1, 1, 0, 0,
                                        1, 1, 1, 1, 1, 0, 0), 7, 7, byrow = T),
               printFlag = F,
               remove.constant = F)
mice_data

A few extra things to note:

Trying it yourself

These examples are both available when running example("mice.impute.truescore")



mmansolf/TSI documentation built on Aug. 29, 2023, 2:38 a.m.