knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(tibble.print_min = 10, tibble.print_max = 20, pillar.min_title_chars = 16)

ItemResponseTrees is an R package that allows to fit IR-tree models in mirt, Mplus, or TAM. If you're unfamiliar with IR-trees, the papers of Böckenholt (2012) and Plieninger (2020) are good starting points. If you're familiar with the class of IR-tree models, this vignette will get you started on fitting your models with ItemResponseTrees.

The package automates some of the hassle of IR-tree modeling by means of a consistent syntax. This allows new users to quickly adopt this model class, and this allows experienced users to fit many complex models effortlessly.

Herein, an illustrative example will be shown using a popular IR-tree model for 5-point items (Böckenholt, 2012). The model is applied to a Big Five data set from Jackson (2012), more precisely to nine conscientiousness items.

library("ItemResponseTrees") data("jackson") set.seed(9701) df1 <- jackson[sample(nrow(jackson), 321), paste0("E", 1:9)] df1

The model is defined by the following tree diagram.
The model equations can be directly derived from the diagram:
The probability for a specific category is given by multiplying all parameters along the branch that lead to that category (see also Böckenholt, 2012; Plieninger, 2020).
For example, the branch leading to Category 5 is comprised of the parameters (1-*m*), *t*, and *e*.
The resulting five equations are shown in the right part of the figure.

knitr::include_graphics("../tools/ecn-model.png")

In the ItemResponseTrees package, a model is defined using a specific syntax that consists mainly of three parts.

`Equations:`

Herein, the equation for each response category is listed in the format`cat = p1 * (1-p2)`

, where`cat`

is one of the observed responses (e.g., 1, ..., 5). Furthermore,`p1`

is a freely chosen parameter label, and I've chosen`t`

,`e`

, and`m`

below corresponding to the diagram above.`IRT:`

The parameters in the`Equations`

(and also those in the figure above) actually correspond to latent variables of an IRT model. These latent variables are measured using a number of items/variables, and this is specified in this section using the same parameter labels as in`Equations`

.

The format for this section is highly similar to the MODEL statement in Mplus: a semicolon is used after each definition; loadings (discrimination parameters) can be fixed using`@`

. The syntax below fixes all loadings corresponding to dimensions*e*and*m*to 1 corresponding to a 1PL or Rasch model, whereas all loadings corresponding to dimension*t*are freely estimated (i.e., 2PL-structure).`Class:`

Can be either`Tree`

for an IR-tree model or`GRM`

for a graded response model.

In the following code chunk, the model string for the IR-tree model depicted above is specified and saved as `m1`

.
The helper function `irtree_create_template()`

can assist you in creating such a model string (especially if you know the "pseudoitems").

m1 <- " # IR-tree model for 5-point items (Böckenholt, 2012) Equations: 1 = (1-m)*(1-t)*e 2 = (1-m)*(1-t)*(1-e) 3 = m 4 = (1-m)*t*(1-e) 5 = (1-m)*t*e IRT: t BY E1, E2, E3, E4, E5, E6, E7, E8, E9; e BY E1@1, E2@1, E3@1, E4@1, E5@1, E6@1, E7@1, E8@1, E9@1; m BY E1@1, E2@1, E3@1, E4@1, E5@1, E6@1, E7@1, E8@1, E9@1; Class: Tree "

In case of a graded response model, only two sections of the model string need to be specified.

m2 <- " # Graded response model IRT: t BY E1, E2, E3, E4, E5, E6, E7, E8, E9; Class: GRM "

Subsequently, the function `irtree_model()`

needs to be called, which takes a model string such as `m1`

or `m2`

as its sole argument.
The resulting objects `model1`

and `model2`

of class `irtree_model`

contain all the necessary information for fitting the model.
Furthermore, one may inspect specific elements, for example, the pseudoitems contained in the mapping matrix.

Further information on creating model strings is provided in `?irtree_model`

.

model1 <- irtree_model(m1) model2 <- irtree_model(m2) model1$mapping_matrix

Then, the model can be `fit()`

using one of three different engines.
The ItemResponseTrees package supports the engines mirt, TAM, and Mplus (via the MplusAutomation package).
Additional arguments for the engine, for example, details of the algorithms, can be specified via the `control`

argument.

# mirt can be used with an EM algorithm (the default) or, for example, with the # MH-RM algorithm, which seems a little bit faster here. # See ?mirt::mirt for details. ctrl <- control_mirt(method = "MHRM") fit1 <- fit(model1, data = df1, engine = "mirt", control = ctrl) fit2 <- fit(model2, data = df1, engine = "mirt", control = ctrl)

The easiest way to access the information stored in `fit1`

and `fit2`

is via the functions `glance()`

, `tidy()`

, and `augment()`

(that come from the broom package, which is part of the tidyverse).

Information about model fit is obtained via `glance()`

.
As seen below, the IR-tree model has 41 freely estimated parameters (3 x 9 thresholds + 9 loadings + 2 variances + 3 covariances).
The GRM has 45 estimated parameters (4 x 9 thresholds + 9 loadings).
(Of course, this comparison is a little bit unfair, because the IR-tree model is much more flexible in terms of dimensionality/"random effects" even though it is less flexible with respect to the thresholds/"fixed effects".)

For the present data, the IR-tree model slightly outperforms the GRM according to AIC and BIC, and thus one may conclude that response styles are present in these data.

glance(fit1) rbind(glance(fit1), glance(fit2))

The parameter estimates are obtained via `tidy()`

.
For the IR-tree model, this returns a tibble with 66 rows (pertaining to the fixed and estimated parameters).
Below, the nine threshold/difficulty parameters `t_E*.d`

pertaining to parameter *t* are shown plus the threshold of pseudoitem `e_E1`

.

The latent variances, covariances, and correlations are shown below as well, and these show the typical pattern of a negative correlation between *e* and *m*.^[The order of the processes corresponds to the order of appearance in the section `IRT`

of the model string.
Thus, the order here is *t*, *e*, *m*, such that `COV_33`

is the variance of person parameters for *m*, and `CORR_32`

is the correlation between *m* and *e*.]

tidy(fit1, par_type = "difficulty") tail(tidy(fit1, par_type = "difficulty"), 9)

The factor scores or person parameter estimates are obtained via `augment()`

.
This returns a tibble comprised of the data set, the factor scores (e.g., `.fitted.t`

), and respective standard errors (e.g., `.se.fit.t`

).

The correlation of the scores for the target trait (extraversion in this case) between the IR-tree model and the GRM indicates that the models differ in this respect even though not drastically.

augment(fit1) cor(augment(fit1)$.fitted.t, augment(fit2)$.fitted.t)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.