# mdirt: Multidimensional discrete item response theory In mirt: Multidimensional Item Response Theory

## Description

mdirt fits a variety of item response models with discrete latent variables. These include, but are not limited to, latent class analysis, multidimensional latent class models, multidimensional discrete latent class models, DINA/DINO models, grade of measurement models, C-RUM, and so on. If response models are not defined explicitly then customized models can defined using the createItem function.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 mdirt( data, model, customTheta = NULL, structure = NULL, item.Q = NULL, nruns = 1, method = "EM", covdata = NULL, formula = NULL, itemtype = "lca", optimizer = "nlminb", return_max = TRUE, group = NULL, GenRandomPars = FALSE, verbose = TRUE, pars = NULL, technical = list(), ... ) 

## Arguments

 data a matrix or data.frame that consists of numerically ordered data, with missing data coded as NA model number of mutually exclusive classes to fit, or alternatively a more specific mirt.model definition (which reflects the so-called Q-matrix). Note that when using a mirt.model, the order with which the syntax factors/attributes are defined are associated with the columns in the customTheta input customTheta input passed to technical = list(customTheta = ...), but is included directly in this function for convenience. This input is most interesting for discrete latent models because it allows customized patterns of latent classes (i.e., defines the possible combinations of the latent attribute profile). The default builds the pattern customTheta = diag(model), which is the typical pattern for the traditional latent class analysis whereby class membership mutually distinct and exhaustive. See thetaComb for a quick method to generate a matrix with all possible combinations structure an R formula allowing the profile probability patterns (i.e., the structural component of the model) to be fitted according to a log-linear model. When NULL, all profile probabilities (except one) will be estimated. Use of this input requires that the customTheta input is supplied, and that the column names in this matrix match the names found within this formula item.Q a list of item-level Q-matrices indicating how the respective categories should be modeled by the underlying attributes. Each matrix must represent a K_i \times A matrix, where K_i represents the number of categories for the ith item, and A is the number of attributes included in the Theta matrix; otherwise, a value ofNULL will default to a matrix consisting of 1's for each K_i \times A element except for the first row, which contains only 0's for proper identification. Incidentally, the first row of each matrix must contain only 0's so that the first category represents the reference category for identification nruns a numeric value indicating how many times the model should be fit to the data when using random starting values. If greater than 1, GenRandomPars is set to true by default method estimation method. Can be 'EM' or 'BL' (see mirt for more details) covdata a data.frame of data used for latent regression models formula an R formula (or list of formulas) indicating how the latent traits can be regressed using external covariates in covdata. If a named list of formulas is supplied (where the names correspond to the latent trait/attribute names in model) then specific regression effects can be estimated for each factor. Supplying a single formula will estimate the regression parameters for all latent variables by default itemtype a vector indicating the itemtype associated with each item. For discrete models this is limited to only 'lca' or items defined using a createItem definition optimizer optimizer used for the M-step, set to 'nlminb' by default. See mirt for more details return_max logical; when nruns > 1, return the model that has the most optimal maximum likelihood criteria? If FALSE, returns a list of all the estimated objects group a factor variable indicating group membership used for multiple group analyses GenRandomPars logical; use random starting values verbose logical; turn on messages to the R console pars used for modifying starting values; see mirt for details technical list of lower-level inputs. See mirt for details ... additional arguments to be passed to the estimation engine. See mirt for more details and examples

## Details

Posterior classification accuracy for each response pattern may be obtained via the fscores function. The summary() function will display the category probability values given the class membership, which can also be displayed graphically with plot(), while coef() displays the raw coefficient values (and their standard errors, if estimated). Finally, anova() is used to compare nested models, while M2 and itemfit may be used for model fitting purposes.

## 'lca' model definition

The latent class IRT model with two latent classes has the form

P(x = k|θ_1, θ_2, a1, a2) = \frac{exp(a1 θ_1 + a2 θ_2)}{ ∑_j^K exp(a1 θ_1 + a2 θ_2)}

where the θ values generally take on discrete points (such as 0 or 1). For proper identification, the first category slope parameters (a1 and a2) are never freely estimated. Alternatively, supplying a different grid of θ values will allow the estimation of similar models (multidimensional discrete models, grade of membership, etc.). See the examples below.

When the item.Q for is utilized, the above equation can be understood as

P(x = k|θ_1, θ_2, a1, a2) = \frac{exp(a1 θ_1 Q_{j1} + a2 θ_2 Q_{j2})}{ ∑_j^K exp(a1 θ_1 Q_{j1} + a2 θ_2 Q_{j2})}

where by construction Q is a K_i \times A matrix indicating whether the category should be modeled according to the latent class structure. For the standard latent class model, the Q-matrix has as many rows as categories, as many columns as the number of classes/attributes modeled, and consist of 0's in the first row and 1's elsewhere. This of course can be over-written by passing an alternative item.Q definition for each respective item.

## Author(s)

Phil Chalmers rphilip.chalmers@gmail.com

## References

Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29.

Proctor, C. H. (1970). A probabilistic formulation and statistical analysis for Guttman scaling. Psychometrika, 35, 73-78. doi: 10.18637/jss.v048.i06

thetaComb, fscores, mirt.model, M2, itemfit, boot.mirt, mirtCluster, wald, coef-method, summary-method, anova-method, residuals-method
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 #LSAT6 dataset dat <- expand.table(LSAT6) # fit with 2-3 latent classes (mod2 <- mdirt(dat, 2)) ## Not run: (mod3 <- mdirt(dat, 3)) summary(mod2) residuals(mod2) residuals(mod2, type = 'exp') anova(mod2, mod3) M2(mod2) itemfit(mod2) # generate classification plots plot(mod2) plot(mod2, facet_items = FALSE) plot(mod2, profile = TRUE) # available for polytomous data mod <- mdirt(Science, 2) summary(mod) plot(mod) plot(mod, profile=TRUE) # classification based on response patterns fscores(mod2, full.scores = FALSE) # classify individuals either with the largest posterior probability..... fs <- fscores(mod2) head(fs) classes <- 1:2 class_max <- classes[apply(apply(fs, 1, max) == fs, 1, which)] table(class_max) # ... or by probability sampling (i.e., plausible value draws) class_prob <- apply(fs, 1, function(x) sample(1:2, 1, prob=x)) table(class_prob) # plausible value imputations for stochastic classification in both classes pvs <- fscores(mod2, plausible.draws=10) tabs <- lapply(pvs, function(x) apply(x, 2, table)) tabs[[1]] # fit with random starting points (run in parallel to save time) mirtCluster() mod <- mdirt(dat, 2, nruns=10) #-------------------------- # Grade of measurement model # define a custom Theta grid for including a 'fuzzy' class membership (Theta <- matrix(c(1, 0, .5, .5, 0, 1), nrow=3 , ncol=2, byrow=TRUE)) (mod_gom <- mdirt(dat, 2, customTheta = Theta)) summary(mod_gom) #----------------- # Multidimensional discrete latent class model dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) # define Theta grid for three latent classes (Theta <- thetaComb(0:1, 3)) (mod_discrete <- mdirt(dat, 3, customTheta = Theta)) summary(mod_discrete) # Located latent class model model <- mirt.model('C1 = 1-32 C2 = 1-32 C3 = 1-32 CONSTRAIN = (1-32, a1), (1-32, a2), (1-32, a3)') (mod_located <- mdirt(dat, model, customTheta = diag(3))) summary(mod_located) #----------------- ### DINA model example # generate some suitable data for a two dimensional DINA application # (first columns are intercepts) set.seed(1) Theta <- expand.table(matrix(c(1,0,0,0, 1,1,0,0, 1,0,1,0, 1,1,1,1), 4, 4, byrow=TRUE), freq = c(200,200,100,500)) a <- matrix(c(rnorm(15, -1.5, .5), rlnorm(5, .2, .3), numeric(15), rlnorm(5, .2, .3), numeric(15), rlnorm(5, .2, .3)), 15, 4) guess <- plogis(a[11:15,1]) # population guess slip <- 1 - plogis(rowSums(a[11:15,])) # population slip dat <- simdata(a, Theta=Theta, itemtype = 'lca') # first column is the intercept, 2nd and 3rd are attributes theta <- cbind(1, thetaComb(0:1, 2)) theta <- cbind(theta, theta[,2] * theta[,3]) #DINA interaction of main attributes model <- mirt.model('Intercept = 1-15 A1 = 1-5 A2 = 6-10 A1A2 = 11-15') # last 5 items are DINA (first 10 are unidimensional C-RUMs) DINA <- mdirt(dat, model, customTheta = theta) coef(DINA, simplify=TRUE) summary(DINA) M2(DINA) # fits well (as it should) cfs <- coef(DINA, simplify=TRUE)\$items[11:15,] cbind(guess, estguess = plogis(cfs[,1])) cbind(slip, estslip = 1 - plogis(rowSums(cfs))) ### DINO model example theta <- cbind(1, thetaComb(0:1, 2)) # define theta matrix with negative interaction term (theta <- cbind(theta, -theta[,2] * theta[,3])) model <- mirt.model('Intercept = 1-15 A1 = 1-5, 11-15 A2 = 6-15 Yoshi = 11-15 CONSTRAIN = (11,a2,a3,a4), (12,a2,a3,a4), (13,a2,a3,a4), (14,a2,a3,a4), (15,a2,a3,a4)') # last five items are DINOs (first 10 are unidimensional C-RUMs) DINO <- mdirt(dat, model, customTheta = theta) coef(DINO, simplify=TRUE) summary(DINO) M2(DINO) #doesn't fit as well, because not the generating model ## C-RUM (analogous to MIRT model) theta <- cbind(1, thetaComb(0:1, 2)) model <- mirt.model('Intercept = 1-15 A1 = 1-5, 11-15 A2 = 6-15') CRUM <- mdirt(dat, model, customTheta = theta) coef(CRUM, simplify=TRUE) summary(CRUM) # good fit, but over-saturated (main effects for items 11-15 can be set to 0) M2(CRUM) #------------------ #multidimensional latent class model dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) # 5 latent classes within 2 different sets of items model <- mirt.model('C1 = 1-16 C2 = 1-16 C3 = 1-16 C4 = 1-16 C5 = 1-16 C6 = 17-32 C7 = 17-32 C8 = 17-32 C9 = 17-32 C10 = 17-32 CONSTRAIN = (1-16, a1), (1-16, a2), (1-16, a3), (1-16, a4), (1-16, a5), (17-32, a6), (17-32, a7), (17-32, a8), (17-32, a9), (17-32, a10)') theta <- diag(10) # defined explicitly. Otherwise, this profile is assumed mod <- mdirt(dat, model, customTheta = theta) coef(mod, simplify=TRUE) summary(mod) #------------------ # multiple group with constrained group probabilities dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) group <- rep(c('G1', 'G2'), each = nrow(SAT12)/2) Theta <- diag(2) # the latent class parameters are technically located in the (nitems + 1) location model <- mirt.model('A1 = 1-32 A2 = 1-32 CONSTRAINB = (33, c1)') mod <- mdirt(dat, model, group = group, customTheta = Theta) coef(mod, simplify=TRUE) summary(mod) #------------------ # Probabilistic Guttman Model (Proctor, 1970) # example analysis can also be found in the sirt package (see ?prob.guttman) data(data.read, package = 'sirt') head(data.read) Theta <- matrix(c(1,0,0,0, 1,1,0,0, 1,1,1,0, 1,1,1,1), 4, byrow=TRUE) model <- mirt.model("INTERCEPT = 1-12 C1 = 1,7,9,11 C2 = 2,5,8,10,12 C3 = 3,4,6") mod <- mdirt(data.read, model, customTheta=Theta) summary(mod) M2(mod) itemfit(mod) ## End(Not run)