MIRT Analysis Report
In TestAnaAPP: A 'shiny' App for Test Analysis and Visualization

knitr::opts_chunk$set(echo=FALSE, warning = FALSE)
requireNamespace("officer")
requireNamespace("rmarkdown")
requireNamespace("officedown")
if (!require(latticeExtra)) {
  install.packages("latticeExtra")
  requireNamespace("latticeExtra")
}
fmt_tra_other <- function(ft, reset_width = TRUE){#Convert matrix to table.
  c <- ncol(ft)
  ft <- as.data.frame(ft)%>%flextable::flextable()
  ft <- flextable::bold(ft, part = "header")
  ft <- flextable::fontsize(ft, size = 10, part = "all")
  ft <- flextable::font(ft, fontname = "Times New Roman")
  ft <- flextable::align_nottext_col(ft, align = "center")
  #
  ft <- flextable::merge_h(ft, part = "header")%>%
    flextable::merge_v(part = "header")%>%
    flextable::theme_booktabs()%>%
    flextable::bold( part = "header")%>% # Bold
    flextable::bold(j=1, part = "body")%>%
    #center
    flextable::align_text_col(align = "center")%>%
    flextable::valign(valign = "center")%>%
    flextable::padding(padding = 2)
  if(reset_width == TRUE){
    ft <- flextable::autofit(ft)%>%
    flextable::width(j = 1:c,width = (16/2.54)/c)
  }
  return(ft)
}
bold_print <- function(text){
  text_format <- officer::fp_text(bold = TRUE, 
                                  font.family = "Time News Roman",
                                  font.size = 12)
  fp <- officer::fp_par(text.align = "center")
  return(officer::fpar(officer::ftext(text, prop = text_format), fp_p = fp))
}
plot_width <- 6.5

# Variables available in this document
Response # The data uploaded by the user
dimension # The dimension of the test
model # The IRT model selected by the user
MIRT_est_method # The estimation method selected by the user
MIRT_person_est_method # The person estimation method selected by the user
MIRT_select_independent # The indicator selected for independence tests
MIRT_itemfit_method # The indicator selected for item fit tests
MIRT_modelfit_relat # The relative fit indices of the IRT model
MIRT_modelfit # The absolute fit indices of the IRT model
MIRT_Q3 # The Q3 statistics for the independence test
MIRT_itemfit # The results of the item fit test
MIRT_itempar # The item parameters
item_info1 # The item information

r bold_print("Table of Contents")

r bold_print("List of Figures")

r bold_print("List of Tables")

\newpage

The multidimensional IRT model assumes that a collection of items is influenced by more than one latent trait. These models can be classified into two categories: between-item models and within-item models.

In between-item multidimensional models, each item is assumed to measure a single latent trait out of several traits or dimensions assessed by the test. On the other hand, within-item multidimensional models allow for at least one item to measure multiple traits or dimensions.

Considering its universality and time-saving potential in estimation, we employed the between-item multidimensional models to analyze data in 'TestAnaAPP'. Therefore, users of 'TestAnaAPP' are required to upload between-item test data along with information regarding the dimensions for analysis.

Introduction to MIRT Models

'TestAnaAPP' offers a range of commonly used scoring models in the field of selection, catering to both dichotomous and polytomous data. The dichotomous scoring models consist of the Rasch model, the one-parameter logistic model (1PL), the two-parameter logistic model (2PL), the three-parameter logistic model (3PL), and the four-parameter logistic model (4PL). On the other hand, the polytomous scoring models encompass the graded response model (GRM), the partial credit model (PCM), and the generalized partial credit model (GPCM). These models are widely recognized and utilized in the field.

Due to MIRT models being an extension of unidimensional IRT models, the interpretation of item parameters in MIRT models is similar to that in unidimensional IRT models.

MIRT Models for Dichotomous Scoring Data

Let $X_{ij}$ denote the response of test taker $j$ ($j = 1,2,3,...,J$) to item $i$ ($i = 1,2,3,...,I$) in dichotomous scoring data, where the value of $X_{ij}$ is 0 (incorrect) or 1 (correct). Additionally, $\beta_i$ represents the item difficulty parameter, while $\theta_{jq}$ represents the latent trait of test taker $j$ on dimension $q$ ($q = 1,2,3,...,Q$). In the context of IRT analysis, the difficulty parameters and latent trait parameters of test takers are comparable, allowing us to determine the difficulty level of an item based on the relative values of these two parameters.

The 'TestAnaAPP' supports specific MIRT models for analyzing dichotomous scoring data.

Multidimensional Rasch model (M1PL)

The Multidimensional Rasch model is commonly used for items scored dichotomously. The item response function (IRF) for the Multidimensional Rasch model can be expressed as follows:

$$P(X_{ij} = 1 | \theta_{jq}) = \frac{1}{1+e^{(-(\theta_{jq} - \beta_i))}}\tag{1}$$ The Multidimensional Rasch model (M1PL) makes a clear assumption that the probability $P_{ij}$ of subject $j$ correctly answering item $i$, indicated as $X_{ij}=1$, is influenced by the relative difficulty of item $i$ for subject $j$, denoted as $\theta_{jq}-\beta_i$.

Multidimensional two-parameter logistic model (M2PL)

In contrast to the M1PL model, the M2PL model incorporates the parameter $d_i$, which is associated with the item difficult parameter $\beta_i$ and item discrimination $\alpha_{iq}$, to describe the psychometric properties of items. The item discrimination $\alpha_{iq}$ represents the discrimination of the $i$th item for the $q$th dimension. In 'TestAnaAPP', for the between-item test, each item is considered to be an indicator for a single dimension, resulting in item slopes ($\alpha_{iq}$) of 0 for all other dimensions except the $q$th dimension. The item response function (IRF) of the M2PL model is displayed as follows:

$$P(X_{ij} = 1 | \theta_{jq}) = \frac{1}{1 + e^{(-(\alpha_{iq}\theta_{jq} + d_i))}}\tag{2}$$ It is important to note that the parameter $d_i$ cannot represent item difficulty; it functions solely as a threshold parameter. In order to achieve a similar interpretation of item parameters as the unidimensional IRT model, researchers (Reckase, 2009; Zhang & Stone, 2008) have proposed the multidimensional discrimination index $MDISC_i$ (Multidimensional Discrimination) to evaluate discrimination in MIRT. The item difficulty parameter $\beta_i$ can be calculated using $MDISC_i$ and $d_i$.

$$MDISC_i = \sqrt{\sum_{q=1}^Q(\alpha_{iq})^2}$$

$$\beta_i = \frac{-d_i}{MDISC_{i}}$$

Multidimensional three-parameter logistic model (M3PL)

In the M3PL model, the item guessing parameter $c_i$ represents the subject's probability of guessing item $i$ correctly. The probability $P_{ij}$ of subject $j$ answering item $i$ correctly, denoted as $X_{ij}=1$, is determined by the combination of $(\alpha_{iq}\theta_{jq} + d_i)$. The Item Response Function (IRF) of the M3PL model is shown below:

$$P(X_{ij}=1 | \theta_{jq}) = c_i+(1-c_i)\frac{1}{1 + e^{(-(\alpha_{iq}\theta_{jq} + d_i))}}\tag{3}$$

Similarly, the item discrimination ($MDISC_i$) and difficulty ($\beta_i$) parameters can also be calculated using the aforementioned formula.

Multidimensional four-parameter logistic model (M4PL)

The M4PL model introduces an additional item error parameter, $1-u_i$, to account for the possibility of high-ability subjects making errors on item $i$. The formula for the Item Response Function (IRF) of the M4PL model is represented as follows:

$$P(X_{ij}=1 | \theta_{jq}) = c_i+(u_i-c_i)\frac{1}{1+e^{(-(\alpha_{iq}\theta_{jq}+d_i))}}\tag{4}$$

MIRT Models for Polytomous Data

Let $X_{ikj}$ denote the response of subject $j$ $(j = 1,2,3,...,J)$ to item $i$ $(i = 1,2,3,...,I)$ in the context of polytomous data. If $X_{ikj}$ is equal to $k$, then $k{\in}{(0,1,2,...,m_i)}$. In the polytomous IRT model, the item difficulty parameters are represented by $\beta_{ih}$, where $h{\in}{(1,2,...,m_i)}$. The latent trait parameter for the subject is denoted by $\theta_{jq}$.

Multidimensional partial credit model (MPCM)

The item parameters of the MPCM model consist solely of difficulty parameters, denoted as $\beta_{ih}$. Here, $\beta_{ih}$ represents the difficulty for category $k$ on item $i$.

The item response function (IRF) of the MPCM model is as follows:

$$P(X_{ikj}=k | \theta_{jq}) = \frac{e^{\sum_{h=0}^{k}(\theta_{jq}-\beta_{ih})}}{\sum_{c=0}^{m_i}e^{\sum_{h=0}^{c}(\theta_{jq}-\beta_{ih})}}\tag{5}$$

where $P(X_{ikj}=k | \theta_{jq})$ represents the probability that subject $j$ responds to item $i$ with category $k$.

Multidimensional generalized partial credit model (MGPCM)

The Generalized Partial Credit Model (GPCM) enables the measurement of discrimination parameters $\alpha_{iq}$ for each item. In the case of the multidimensional generalized partial credit model (MGPCM), the item response function (IRF) can be described as follows:

$$P(X_{ikj}=k | \theta_{jq}) = \frac{e^{\sum_{h=0}^{k}[(\alpha_{iq}\theta_{jq}+d_{ih})]}}{\sum_{c=0}^{m_i}e^{\sum_{h=0}^{c}[(\alpha_{iq}\theta_{jq}+d_{ih})]}}\tag{6}$$

In the given formula, $d_i$ denotes the item threshold parameter and can be transformed into the item difficulty parameter by utilizing $MDISC_i$ (as explained in the formula mentioned above).

Multidimensional graded response model (MGRM)

The MPCM and MGPCM models are both built on the premise of modeling the probability of response categories $k$ for a given item $i$. In contrast, the MGRM model focuses on modeling the cumulative probability of response categories $k$ or higher. The MGRM model employs the following function to represent the cumulative response probability:

$$P(X_{ij}{\geq} k | \theta_{jq}) = \frac{e^{[(\alpha_{iq}\theta_{jq}+d_{ik})]}}{1+e^{[(\alpha_{iq}\theta_{jq}+d_{ik})]}}\tag{7}$$

where $P(X_{ij}{\geq} k | \theta_{jq})$ represents the probability that subject $j$ responds to item $i$ with category $k$ or a higher category. The parameter $d_{ik}$ represents the threshold for response category $k$ on item $i$. The probability of a specific response category can be computed by subtracting the adjacent cumulative probabilities.

$$P(X_{ij}= k | \theta_{jq})=P(X_{ij}{\geq} k | \theta_{jq})-P(X_{ij}{\geq} k+1 | \theta_{jq})\tag{8}$$

It is important to note that, in contrast to the GPCM and PCM models, the GRM model assumes a relationship where $\beta_{ik}{\geq}{\beta_{i(k-1)}}$, implying that the difficulty of higher categories is consistently greater than that of lower categories. Similarly, the $\beta_{ik}$ parameter of the MGRM model can also be calculated using the aforementioned formula.

Selected Model and Settings

The results presented in this document were estimated based on the settings you selected in the interactive 'TestAnaAPP' program, along with default options within the program. The following settings were employed:

IRT model: r model；
Model parameter estimation method: r MIRT_est_method；
Person parameter estimation method: r MIRT_person_est_method；
Selected indicator for independence tests: r MIRT_select_independent；
Selected indicator for item fit tests: r MIRT_itemfit_method；

Test Structure

The uploaded test structure (dimension) is displayed below,

fmt_tra_other(cbind(rownames(dimension), dimension))

Model Fit

The model's fitting results included both relative fit indices and absolute fit indices. Relative fit indices serve as a means to compare the level of fit between models. When two IRT models are used to fit the same dataset and produce different sets of relative fit index values, the model with the lower relative fit index value is generally considered to fit the data comparatively better. Absolute fit indices reflect the degree of absolute fit between the model and the data. If the absolute fit indices fall within an acceptable range, it suggests that the model fits the data well.

Relative Fit Indices

The 'TestAnaAPP' program utilizes the 'mirt' package for parameter estimation and calculates relative fit indices, including AIC, SABIC, HQ, BIC, and Log-likelihood. Smaller values of AIC, SABIC, HQ, and BIC indicate a higher degree of model fit. A greater value of Log-likelihood signifies a higher degree of model fit. To employ relative fit indices for IRT model selection, you can opt for various models during parameter estimation in 'TestAnaAPP' and record the corresponding relative fit index magnitudes.

fmt_tra_other(ft = MIRT_modelfit_relat)

Absolute Fit Indices

'TestAnaAPP' generates absolute fit indices, including M2, RMSEA, SRMSR, TLI, and CFI.

M2: If the corresponding p-value is less than 0.05 (α<0.05), the null hypothesis is rejected, suggesting an inadequate fit of the model to the data.
RMSEA: Browne and Cudeck (1993) provided a well-regarded guide to interpreting RMSEA. A value below 0.05 indicates a very good fit (close fit), 0.05 to 0.08 suggests a relatively good fit (fair fit), 0.08 to 0.10 implies a mediocre fit, and a value above 0.10 indicates a poor fit.
SRMSR: A value of SRMSR≤0.05 indicates an acceptable fit between the model and the data.
TLI and CFI values greater than 0.9 suggest a satisfactory fit of the model to the data.

fmt_tra_other(ft = MIRT_modelfit)

Hypothesis Testing

Independence Test

'TestAnaAPP' provides the Q3 and LD-X2 statistics for independent testing in IRT. Larger absolute values indicate stronger dependencies between the two items. Due to the lack of uniform standards across different studies, cutoff values for Q3 and LD-X2 are not provided. Readers are advised to refer to professional books or papers for data interpretation.

For the independence test in this analysis, r MIRT_select_independent was chosen as the selected statistic.

Q3_color <- function(x, value = MIRTreport_Q3_highlight){
  out <- rep("black", length(x))
  out[abs(x) > value] <- "red"
  out[abs(x) == 1] <- "black"
  out
}

MIRT_Q3_print <- cbind("Item" = colnames(MIRT_Q3), MIRT_Q3) %>% as.data.frame()

fmt_tra_other(MIRT_Q3_print,  reset_width = FALSE) %>% 
  flextable::color(j = 2:ncol(MIRT_Q3_print),
                       color = Q3_color)

Note: The red color indicates that the absolute value of the Q3 statistic exceeds the threshold value you set on TestAnaAPP.

Item Fit

The item fit test is employed to assess whether there exists a significant disparity between the predicted responses of the model and the actual responses. 'TestAnaAPP' offers the $X^2$ statistic for conducting the item fit test. A $P$ value below 0.05 suggests a significant difference between the predicted model values and the actual data. Moreover, an RMSEA value exceeding 0.1 indicates a substantial disparity between the predicted values and the actual data. A comprehensive assessment can be made based on the test situation and statistical indicators. Different results may be obtained by selecting an alternative model for fitting purposes.

fmt_tra_other(MIRT_itemfit)

Item Parameters

Item parameters serve as crucial indicators for assessing item quality. Provided below are the criteria, derived from "The Basics of Item Response Theory Using R", that can be utilized to evaluate item discrimination values. Readers can employ these criteria to evaluate the extent of item discrimination in the test under analysis.

`r bold_print("Table: Labels for Item Discrimination Values")``

| Verbal label | Range of values | Typical Value| |:------------:|:---------------:|:------------:| | None | 0 | 0.00 | | Very low | 0.01-0.34 | 0.18 | | Low | 0.35-0.64 | 0.50 | | Moderate | 0.65-1.34 | 1.00 | | High | 1.35-1.69 | 1.50 | | Very high | >1.70 | 2.00 | | Perfect | $+\infty$ | $+\infty$ |

In this analysis, the r model model was selected. The r MIRT_est_method was used for parameter estimation. Generally, the discrimination and difficulty are assessed using $MDISC_i$ and $\beta_i$. Therefore, the 'TestAnaAPP' transformed the $\alpha_{iq}$ and $d_i$ parameters into $MDISC_i$ and $\beta_i$ respectively. In a between-item test, $MDISC_i$ equals the absolute value of $\alpha_{iq}$ since each item only has one $\alpha_{i.}$. To obtain the $d_i$ parameter in the above formula, users can calculate $d_i$ using $\beta_i$ as follows: $d_i = -\beta_i*MDISC_{i}$.

The item parameters are as follows:

disc_color <- function(x, value = MIRTreport_alpha_highlight){
  out <- rep("black", length(x))
  out[x < value] <- "red"
  out
}
MIRT_itempar_print <- cbind("Item" = MIRT_itemfit[,1],MIRT_itempar)
fmt_tra_other(MIRT_itempar_print) %>% 
  flextable::color(j = colnames(MIRT_itempar_print) %>%
                     str_which(pattern = "Discrimination"),
                   color = disc_color)

Note: The red color indicates that the item discrimination value is below the threshold value you set on TestAnaAPP.

Wright Map

IRT assesses item difficulty and participant latent traits on the same scale, enabling comparability. This provides a significant advantage compared to CTT. The Wright map serves as a visual tool for comparing the distribution of item difficulty and latent traits. By comparing the scatterplot distribution of item difficulty with latent trait values, the relative difficulty or ease of the test for the subjects, as well as the difficulty level of the test, can be determined. Additionally, it enables the identification of harder and easier items.

It is important to note that the test can effectively differentiate the subjects' latent trait levels only when it possesses appropriate difficulty. Otherwise, ceiling or floor effects may arise, whereby the measurement tool inaccurately assesses the subjects' true abilities, resulting in all subjects receiving either high or low scores.

For multidimensional IRT models, the 'TestAnaAPP' displays the Wright map for each dimension on the interactive interface. You can download the selected Wright map, out of r mode$F_n Wright maps, from the interactive interface of 'TestAnaAPP'.

print(MIRT_wright)

Item Characteristic Curve (ICC)

The Item Characteristic Curve (ICC) visualizes the probability of subjects obtaining a specific score based on the predictions of the statistical model.

It allows us to observe the probability of subjects with different levels of latent traits obtaining a specific score on each item.

Meanwhile, it provides insights into the characteristics of items for subjects with varying levels of latent traits.

Detailed information on interpreting the ICC can be found in professional books.

The 'TestAnaAPP' utilizes a specific latent trait for each item to construct the ICC figures for the between-item test, which is considered acceptable.

print(MIRT_ICC)

Item Information Curve (IIC)

Item information represents the level of measurement error associated with each item. Higher values indicate a greater contribution of the item to the accurate measurement of the latent trait of the subjects. The value of item information is often related to the discriminatory power and difficulty level of the item. The Item Information Curve (IIC) visually represents the item information and allows readers to assess the contribution of the items to the accurate measurement of a specific level of latent trait.

Professional books can provide guidance on interpreting the IIC curve for those seeking to gain a comprehensive understanding.

In the between-item test, which is considered acceptable according to 'TestAnaAPP', the IIC figures are based on a single latent trait that corresponds to each item.

print(MIRT_IIC)

Test Information Curve (TIC)

Test information is a quantitative measure of test accuracy in IRT, calculated by summing the information obtained from each item. A higher test information value indicates a more precise measurement of the latent trait for the subjects being assessed. 'TestAnaAPP' offers test information curves (TIC) and measurement error curves as graphical representations to demonstrate the quality of the test. A consistent conversion relationship exists between test information ($I(\theta)$) and measurement error ($SE$), which can be expressed by the following formula:

$$SE = \frac{1}{\sqrt{I(\theta)}}$$

Since the test is a between-item test, where each item measures only one of several latent traits, it is not feasible to visually represent $\theta$ in multiple dimensions. Therefore, 'TestAnaAPP' is designed to display one dimension of the TIC at a time. Below is the TIC that you have chosen to display in the user interface.

if(mode$is.within_item==TRUE){
  NULL%>%print()
}else{
   for(i in 1:mode$F_n){
     TIC_dim <- mode$F_names[i]
     sim_theta1_infor1 <- item_info1[,c(TIC_dim,paste0(TIC_dim,"infor"))]
     F_name <- TIC_dim

     plotrix::twoord.plot(lx = sim_theta1_infor1[,1],ly = sim_theta1_infor1[,2],
            rx = sim_theta1_infor1[,1],ry = 1/sqrt(sim_theta1_infor1[,2]),
            main = paste0("Test Information and Measurement Error of ",F_name),
            ylab = "Test information",
            rylab = "Measurement error",
            xlab = "Latent Trait",
            rcol = "red",
            lytickpos = seq(0, max(sim_theta1_infor1[,2]),
                            ceiling(max(sim_theta1_infor1[,2])/5)),
            lylim = c(0,(max(sim_theta1_infor1[,2])+0.5)),
            type = c("l","p"),rpch = 1)
    text(x = sim_theta1_infor1[which.max(sim_theta1_infor1[,2]),1],
         y = max(sim_theta1_infor1[,2])
         ,labels = paste(round(max(sim_theta1_infor1[,2]),3)))
   }

}

It is important to note that users can download the item information and test information from the user interface. This allows them to create custom figures using the downloaded files.

Note: 'TestAnaAPP' is mainly designed to provide convenient item and test analysis programs for practitioners in psychological and educational measurement. The purpose of providing this report is to present the analysis results completely and clearly, and to provide universal and optimal result interpretation is not the primary objective of 'TestAnaAPP'. Therefore, while we hope users can benefit from the user-friendly nature of this program, we also encourage maintaining a cautious attitude toward the interpretation of results to avoid potential biases. Additionally, if users discover any errors in this program, we would greatly appreciate your contact with us (jiangyouxiang34@163.com).

\newpage

Reference

Reckase. (2009). Multidimensional Item Response Theory (1st ed. 2009.). Springer New York. https://doi.org/10.1007/978-0-387-89976-3
Zhang, B & Stone, C. A. (2008). Evaluating Item Fit for Multidimensional Item Response Models. Educational and Psychological Measurement, 68(2), 181–196. https://doi.org/10.1177/0013164407301547

Any scripts or data that you put into this service are public.

TestAnaAPP documentation built on April 3, 2025, 10:30 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TestAnaAPP
A 'shiny' App for Test Analysis and Visualization

MIRT Analysis Report
In TestAnaAPP: A 'shiny' App for Test Analysis and Visualization

Introduction to MIRT Models

MIRT Models for Dichotomous Scoring Data

Multidimensional Rasch model (M1PL)

Multidimensional two-parameter logistic model (M2PL)

Multidimensional three-parameter logistic model (M3PL)

Multidimensional four-parameter logistic model (M4PL)

MIRT Models for Polytomous Data

Multidimensional partial credit model (MPCM)

Multidimensional generalized partial credit model (MGPCM)

Multidimensional graded response model (MGRM)

Selected Model and Settings

Test Structure

Model Fit

Relative Fit Indices

Absolute Fit Indices

Hypothesis Testing

Independence Test

Item Fit

Item Parameters

Wright Map

Item Characteristic Curve (ICC)

Item Information Curve (IIC)

Test Information Curve (TIC)

Reference

Try the TestAnaAPP package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TestAnaAPP A 'shiny' App for Test Analysis and Visualization

MIRT Analysis Report In TestAnaAPP: A 'shiny' App for Test Analysis and Visualization

Introduction to MIRT Models

MIRT Models for Dichotomous Scoring Data

Multidimensional Rasch model (M1PL)

Multidimensional two-parameter logistic model (M2PL)

Multidimensional three-parameter logistic model (M3PL)

Multidimensional four-parameter logistic model (M4PL)

MIRT Models for Polytomous Data

Multidimensional partial credit model (MPCM)

Multidimensional generalized partial credit model (MGPCM)

Multidimensional graded response model (MGRM)

Selected Model and Settings

Test Structure

Model Fit

Relative Fit Indices

Absolute Fit Indices

Hypothesis Testing

Independence Test

Item Fit

Item Parameters

Wright Map

Item Characteristic Curve (ICC)

Item Information Curve (IIC)

Test Information Curve (TIC)

Reference

Try the TestAnaAPP package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TestAnaAPP
A 'shiny' App for Test Analysis and Visualization

MIRT Analysis Report
In TestAnaAPP: A 'shiny' App for Test Analysis and Visualization