cat.dt tutorial
In cat.dt: Computerized Adaptive Testing and Decision Trees

knitr::opts_chunk$set(echo = TRUE)

The cat.dt package implements the Merged Tree-CAT method (Rodríguez-Cuadrado et al., 2019, doi.org/10.1016/j.eswa.2019.113066) aimed at creating Computerized Adaptive Tests (CATs) in a fast and efficient way. The package stores the CAT in a tree structure where each node contains an item of the test. The examinee starts from a root node and progresses through the tree, depending on the responses provided to the items found.

The cat.dt package includes the following functionalities:

Building of the Tree-CAT.
Graphical display of the Tree-CAT created.
Estimation of the ability level for an individual or group of examinees.
Graphical visualization of the evolution of the estimation of the ability level of each examinee according to their responses.

Tree-CAT creation

The package can be installed from CRAN:

install.packages("cat.dt")

library(cat.dt)

It can also be installed from the development version’s github repository github.com/jlaria/cat.dt.

Once the package is installed, the Tree-CAT is built by calling the main function CAT_DT. This function has the following input parameters:

bank: Item bank. It must be a data frame in which each row represents an item and each column one of its parameters. If the probabilistic response model chosen is the Graded Response Model (GRM, polytomous items with ordered responses) (Samejima, 1969, doi.org/10.1007/BF03372160), the first column must be the discrimination parameter and the remaining columns the difficulty (or location) parameters. If the model is the Nominal Response Model (NRM, polytomous items without ordered responses) (Bock, 1972, doi.org/10.1007/BF02291411), the odd columns must be the slope parameters and the even columns the intercept parameters.
model: CAT probabilistic model. Options: "GRM" (default) and "NRM".
crit: Item selection criterion. Options: "MEPV" for the Minimum Expected Posterior Variance (default) or "MFI" for the Maximum Fisher Information.
C: Expected fraction $C$ of participants administered with each item (exposure rate). It can be a vector with as many elements as items in the bank or a positive number if all the items have the same rate. Default: C = 0.3.
stop: vector of two components that represent the decision tree stopping criterion. The first component represents the maximum level $L$ of the decision tree, and the second represents the minimum standard error of the ability level (if it is 0, this second criterion is not applied). Default: stop = c(6,0).
limit: Maximum number $N$ of nodes per level (max. $N=10000$). This is the main parameter that controls the tree growth. It must be a natural number. Default: limit = 200.
inters: Minimum intersection of the density functions of two nodes to be joined. It must be a number between 0 and 1. If the user wants to avoid using this criterion, inters = 0 should be specified. Default: inters = 0.98.
p: Prior probability of the interval whose limits determine a threshold for the distance between estimations of nodes to join. Default: p = 0.9.
dens: Prior density function of the latent level. It must be an R function: dnorm, dunif, etc.
...: Parameters to dens.

Therefore, it is necessary to have a bank of calibrated items in the form of 'data.frame' or 'matrix'. The cat.dt package includes an item bank that will be used in this tutorial:

data("itemBank")

The function CAT_DT is called and the tree is stored in the variable TreeCAT:

TreeCAT = CAT_DT(bank = itemBank, model = "GRM", crit = "MEPV", C = 0.3, stop = c(6, 0.6), limit = 200, inters = 0.98, p = 0.9, dens = dnorm, 0, 1)

The function CAT_DT returns an object of class cat.dt. This object contains a list of the input parameters and also the following elements:

nodes: List with $L+1$ elements (levels). Each level contains a list of the nodes of the corresponding level. The nodes of the additional level $L+1$ only include the estimation and distribution of the ability level, given the responses to the items of the final level $L$. Notice that we may end up with a tree with less than $L+1$ levels, since the stopping criterion of the standard error may prevail for all nodes.
C_left: Residual exposure rate of each item after the CAT construction.
predict: Function that returns the estimated ability level of an examinee after each response and a Bayesian credible interval of the final estimation given their responses to the items from the item bank. These responses must be entered by the user as a numeric vector input. In addition, it returns a vector with the items that have been administered to the examinee and a plot object named graphics that represents the evolution of the ability level estimation through the test.
predict_group: Function that returns a list whose elements are the returned values of the function predict for every examinee.

Tree-CAT summary

The Tree-CAT summary is a description of the Tree-CAT that contains the following elements:

The number of the Tree-CAT levels.
The number of nodes in each Tree-CAT level.
The psychometric probabilistic model used.
The item selection criterion used.
The exposure of every item from the bank after the Tree-CAT creation.
The percentage of items used in the Tree-CAT creation.

The summary is obtained in the following way:

summary(TreeCAT)

Tree-CAT visualization

The Tree-CAT is displayed by calling the plot_tree function. This function takes as input arguments: i) The Tree-CAT created; ii) The number of levels to plot and iii) The index of the root node to start the test. For example, to visualize the first three levels of the tree starting by the root node two:

plot_tree(TreeCAT, levels = 3, tree = 2)

The number within each node represents the item selected for that node and the color of each branch represents the response provided. In the figure above, we can see that the test starts with item 11. If we give answer 1 to that item, we will advance to item 18, and so on.

Evaluation of an individual and a group of examinees

Once the Tree-CAT is created, it can be administered to an individual or a group of participants. To administer the Tree-CAT to an individual, the function predict is used. The arguments of this function are the object Tree-CAT of the class cat.dt and a vector that contains the responses provided by the individual to each item from the item bank. It is important to note that the responses must take integer values from $1$ upwards.

As an example, the response dataset itemRes included in the package will be used. From that dataset, the first individual is evaluated (first row). The output is stored in the variable individual_ev:

set.seed(0)

individual_ev = predict(TreeCAT, itemRes[1, ])

This function returns a list with the following elements:

estimation: Ability level estimation after each response provided by the individual.
llow: Lower limit of the $95\%$ credible interval of the final estimation.
llup: Upper limit of the $95\%$ credible interval of the final estimation.
items: Items administered to the individual.
graphics: Plot object that represents the evolution of the ability level estimation after every response.

The estimation output:

individual_ev$estimation

The credible interval output:

individual_ev$llow
individual_ev$lupp

The administered items output:

individual_ev$items

The plot of the evolution of the ability level estimation:

individual_ev$graphics

This plot represents the estimation of the ability level after responding to each one of the test items. For example, giving the response $1$ to the item $70$ results in an estimate of $-0.5$. Then, after giving the response $3$ to the item $18$, the estimate increases to $0$ approximately, and so on. Note that the value of the response influences whether the estimate decreases or increases.

This results can also be obtained by introducing

individual_ev = TreeCAT$predict(itemRes[1, ])

individual_ev = CAT_ability_est(TreeCAT, itemRes[1, ])

Note: By repeating this same code, the results of this section may be different. If the tree has a number of root nodes greater than one, the same individual may start the test by a different root node and therefore obtain different results when performing the test again.

The function predict is also used to evaluate a group of examinees. However, in this case, the vector of responses is replaced by a matrix containing in each row the answers of each examinee. This function returns a list, where each element in the list represents one of the examinees in the group. Each element contains a list with the same elements that the function returns when it is applied to an individual. As an example, we proceed to store the result in the variable group_ev: