knitr::opts_chunk$set(echo=FALSE, warning = FALSE) requireNamespace("officer") requireNamespace("rmarkdown") requireNamespace("officedown") if (!require(latticeExtra)) { install.packages("latticeExtra") requireNamespace("latticeExtra") } fmt_tra_other <- function(ft, reset_width = TRUE){#Convert matrix to table. c <- ncol(ft) ft <- as.data.frame(ft)%>%flextable::flextable() ft <- flextable::bold(ft, part = "header") ft <- flextable::fontsize(ft, size = 10, part = "all") ft <- flextable::font(ft, fontname = "Times New Roman") ft <- flextable::align_nottext_col(ft, align = "center") # ft <- flextable::merge_h(ft, part = "header")%>% flextable::merge_v(part = "header")%>% flextable::theme_booktabs()%>% flextable::bold( part = "header")%>% # Bold flextable::bold(j=1, part = "body")%>% #center flextable::align_text_col(align = "center")%>% flextable::valign(valign = "center")%>% flextable::padding(padding = 2) if(reset_width == TRUE){ ft <- flextable::autofit(ft)%>% flextable::width(j = 1:c,width = (16/2.54)/c) } return(ft) } bold_print <- function(text){ text_format <- officer::fp_text(bold = TRUE, font.family = "Time News Roman", font.size = 12) fp <- officer::fp_par(text.align = "center") return(officer::fpar(officer::ftext(text, prop = text_format), fp_p = fp)) } plot_width <- 6.5
# Variables available in this document Response # The data uploaded by the user dimension # The dimension of the test model # The IRT model selected by the user MIRT_est_method # The estimation method selected by the user MIRT_person_est_method # The person estimation method selected by the user MIRT_select_independent # The indicator selected for independence tests MIRT_itemfit_method # The indicator selected for item fit tests MIRT_modelfit_relat # The relative fit indices of the IRT model MIRT_modelfit # The absolute fit indices of the IRT model MIRT_Q3 # The Q3 statistics for the independence test MIRT_itemfit # The results of the item fit test MIRT_itempar # The item parameters item_info1 # The item information
r bold_print("Table of Contents")
r bold_print("List of Figures")
r bold_print("List of Tables")
\newpage
The multidimensional IRT model assumes that a collection of items is influenced by more than one latent trait. These models can be classified into two categories: between-item models and within-item models.
In between-item multidimensional models, each item is assumed to measure a single latent trait out of several traits or dimensions assessed by the test. On the other hand, within-item multidimensional models allow for at least one item to measure multiple traits or dimensions.
Considering its universality and time-saving potential in estimation, we employed the between-item multidimensional models to analyze data in 'TestAnaAPP'. Therefore, users of 'TestAnaAPP' are required to upload between-item test data along with information regarding the dimensions for analysis.
'TestAnaAPP' offers a range of commonly used scoring models in the field of selection, catering to both dichotomous and polytomous data. The dichotomous scoring models consist of the Rasch model, the one-parameter logistic model (1PL), the two-parameter logistic model (2PL), the three-parameter logistic model (3PL), and the four-parameter logistic model (4PL). On the other hand, the polytomous scoring models encompass the graded response model (GRM), the partial credit model (PCM), and the generalized partial credit model (GPCM). These models are widely recognized and utilized in the field.
Due to MIRT models being an extension of unidimensional IRT models, the interpretation of item parameters in MIRT models is similar to that in unidimensional IRT models.
Let $X_{ij}$ denote the response of test taker $j$ ($j = 1,2,3,...,J$) to item $i$ ($i = 1,2,3,...,I$) in dichotomous scoring data, where the value of $X_{ij}$ is 0 (incorrect) or 1 (correct). Additionally, $\beta_i$ represents the item difficulty parameter, while $\theta_{jq}$ represents the latent trait of test taker $j$ on dimension $q$ ($q = 1,2,3,...,Q$). In the context of IRT analysis, the difficulty parameters and latent trait parameters of test takers are comparable, allowing us to determine the difficulty level of an item based on the relative values of these two parameters.
The 'TestAnaAPP' supports specific MIRT models for analyzing dichotomous scoring data.
The Multidimensional Rasch model is commonly used for items scored dichotomously. The item response function (IRF) for the Multidimensional Rasch model can be expressed as follows:
$$P(X_{ij} = 1 | \theta_{jq}) = \frac{1}{1+e^{(-(\theta_{jq} - \beta_i))}}\tag{1}$$ The Multidimensional Rasch model (M1PL) makes a clear assumption that the probability $P_{ij}$ of subject $j$ correctly answering item $i$, indicated as $X_{ij}=1$, is influenced by the relative difficulty of item $i$ for subject $j$, denoted as $\theta_{jq}-\beta_i$.
In contrast to the M1PL model, the M2PL model incorporates the parameter $d_i$, which is associated with the item difficult parameter $\beta_i$ and item discrimination $\alpha_{iq}$, to describe the psychometric properties of items. The item discrimination $\alpha_{iq}$ represents the discrimination of the $i$th item for the $q$th dimension. In 'TestAnaAPP', for the between-item test, each item is considered to be an indicator for a single dimension, resulting in item slopes ($\alpha_{iq}$) of 0 for all other dimensions except the $q$th dimension. The item response function (IRF) of the M2PL model is displayed as follows:
$$P(X_{ij} = 1 | \theta_{jq}) = \frac{1}{1 + e^{(-(\alpha_{iq}\theta_{jq} + d_i))}}\tag{2}$$ It is important to note that the parameter $d_i$ cannot represent item difficulty; it functions solely as a threshold parameter. In order to achieve a similar interpretation of item parameters as the unidimensional IRT model, researchers (Reckase, 2009; Zhang & Stone, 2008) have proposed the multidimensional discrimination index $MDISC_i$ (Multidimensional Discrimination) to evaluate discrimination in MIRT. The item difficulty parameter $\beta_i$ can be calculated using $MDISC_i$ and $d_i$.
$$MDISC_i = \sqrt{\sum_{q=1}^Q(\alpha_{iq})^2}$$
$$\beta_i = \frac{-d_i}{MDISC_{i}}$$
In the M3PL model, the item guessing parameter $c_i$ represents the subject's probability of guessing item $i$ correctly. The probability $P_{ij}$ of subject $j$ answering item $i$ correctly, denoted as $X_{ij}=1$, is determined by the combination of $(\alpha_{iq}\theta_{jq} + d_i)$. The Item Response Function (IRF) of the M3PL model is shown below:
$$P(X_{ij}=1 | \theta_{jq}) = c_i+(1-c_i)\frac{1}{1 + e^{(-(\alpha_{iq}\theta_{jq} + d_i))}}\tag{3}$$
Similarly, the item discrimination ($MDISC_i$) and difficulty ($\beta_i$) parameters can also be calculated using the aforementioned formula.
The M4PL model introduces an additional item error parameter, $1-u_i$, to account for the possibility of high-ability subjects making errors on item $i$. The formula for the Item Response Function (IRF) of the M4PL model is represented as follows:
$$P(X_{ij}=1 | \theta_{jq}) = c_i+(u_i-c_i)\frac{1}{1+e^{(-(\alpha_{iq}\theta_{jq}+d_i))}}\tag{4}$$
Let $X_{ikj}$ denote the response of subject $j$ $(j = 1,2,3,...,J)$ to item $i$ $(i = 1,2,3,...,I)$ in the context of polytomous data. If $X_{ikj}$ is equal to $k$, then $k{\in}{(0,1,2,...,m_i)}$. In the polytomous IRT model, the item difficulty parameters are represented by $\beta_{ih}$, where $h{\in}{(1,2,...,m_i)}$. The latent trait parameter for the subject is denoted by $\theta_{jq}$.
The item parameters of the MPCM model consist solely of difficulty parameters, denoted as $\beta_{ih}$. Here, $\beta_{ih}$ represents the difficulty for category $k$ on item $i$.
The item response function (IRF) of the MPCM model is as follows:
$$P(X_{ikj}=k | \theta_{jq}) = \frac{e^{\sum_{h=0}^{k}(\theta_{jq}-\beta_{ih})}}{\sum_{c=0}^{m_i}e^{\sum_{h=0}^{c}(\theta_{jq}-\beta_{ih})}}\tag{5}$$
where $P(X_{ikj}=k | \theta_{jq})$ represents the probability that subject $j$ responds to item $i$ with category $k$.
The Generalized Partial Credit Model (GPCM) enables the measurement of discrimination parameters $\alpha_{iq}$ for each item. In the case of the multidimensional generalized partial credit model (MGPCM), the item response function (IRF) can be described as follows:
$$P(X_{ikj}=k | \theta_{jq}) = \frac{e^{\sum_{h=0}^{k}[(\alpha_{iq}\theta_{jq}+d_{ih})]}}{\sum_{c=0}^{m_i}e^{\sum_{h=0}^{c}[(\alpha_{iq}\theta_{jq}+d_{ih})]}}\tag{6}$$
In the given formula, $d_i$ denotes the item threshold parameter and can be transformed into the item difficulty parameter by utilizing $MDISC_i$ (as explained in the formula mentioned above).
The MPCM and MGPCM models are both built on the premise of modeling the probability of response categories $k$ for a given item $i$. In contrast, the MGRM model focuses on modeling the cumulative probability of response categories $k$ or higher. The MGRM model employs the following function to represent the cumulative response probability:
$$P(X_{ij}{\geq} k | \theta_{jq}) = \frac{e^{[(\alpha_{iq}\theta_{jq}+d_{ik})]}}{1+e^{[(\alpha_{iq}\theta_{jq}+d_{ik})]}}\tag{7}$$
where $P(X_{ij}{\geq} k | \theta_{jq})$ represents the probability that subject $j$ responds to item $i$ with category $k$ or a higher category. The parameter $d_{ik}$ represents the threshold for response category $k$ on item $i$. The probability of a specific response category can be computed by subtracting the adjacent cumulative probabilities.
$$P(X_{ij}= k | \theta_{jq})=P(X_{ij}{\geq} k | \theta_{jq})-P(X_{ij}{\geq} k+1 | \theta_{jq})\tag{8}$$
It is important to note that, in contrast to the GPCM and PCM models, the GRM model assumes a relationship where $\beta_{ik}{\geq}{\beta_{i(k-1)}}$, implying that the difficulty of higher categories is consistently greater than that of lower categories. Similarly, the $\beta_{ik}$ parameter of the MGRM model can also be calculated using the aforementioned formula.
The results presented in this document were estimated based on the settings you selected in the interactive 'TestAnaAPP' program, along with default options within the program. The following settings were employed:
r model
;r MIRT_est_method
;r MIRT_person_est_method
;r MIRT_select_independent
;r MIRT_itemfit_method
;The uploaded test structure (dimension) is displayed below,
fmt_tra_other(cbind(rownames(dimension), dimension))
The model's fitting results included both relative fit indices and absolute fit indices. Relative fit indices serve as a means to compare the level of fit between models. When two IRT models are used to fit the same dataset and produce different sets of relative fit index values, the model with the lower relative fit index value is generally considered to fit the data comparatively better. Absolute fit indices reflect the degree of absolute fit between the model and the data. If the absolute fit indices fall within an acceptable range, it suggests that the model fits the data well.
The 'TestAnaAPP' program utilizes the 'mirt' package for parameter estimation and calculates relative fit indices, including AIC, SABIC, HQ, BIC, and Log-likelihood. Smaller values of AIC, SABIC, HQ, and BIC indicate a higher degree of model fit. A greater value of Log-likelihood signifies a higher degree of model fit. To employ relative fit indices for IRT model selection, you can opt for various models during parameter estimation in 'TestAnaAPP' and record the corresponding relative fit index magnitudes.
fmt_tra_other(ft = MIRT_modelfit_relat)
'TestAnaAPP' generates absolute fit indices, including M2, RMSEA, SRMSR, TLI, and CFI.
fmt_tra_other(ft = MIRT_modelfit)
'TestAnaAPP' provides the Q3 and LD-X2 statistics for independent testing in IRT. Larger absolute values indicate stronger dependencies between the two items. Due to the lack of uniform standards across different studies, cutoff values for Q3 and LD-X2 are not provided. Readers are advised to refer to professional books or papers for data interpretation.
For the independence test in this analysis, r MIRT_select_independent
was chosen as the selected statistic.
Q3_color <- function(x, value = MIRTreport_Q3_highlight){ out <- rep("black", length(x)) out[abs(x) > value] <- "red" out[abs(x) == 1] <- "black" out } MIRT_Q3_print <- cbind("Item" = colnames(MIRT_Q3), MIRT_Q3) %>% as.data.frame() fmt_tra_other(MIRT_Q3_print, reset_width = FALSE) %>% flextable::color(j = 2:ncol(MIRT_Q3_print), color = Q3_color)
Note: The red color indicates that the absolute value of the Q3 statistic exceeds the threshold value you set on TestAnaAPP.
The item fit test is employed to assess whether there exists a significant disparity between the predicted responses of the model and the actual responses. 'TestAnaAPP' offers the $X^2$ statistic for conducting the item fit test. A $P$ value below 0.05 suggests a significant difference between the predicted model values and the actual data. Moreover, an RMSEA value exceeding 0.1 indicates a substantial disparity between the predicted values and the actual data. A comprehensive assessment can be made based on the test situation and statistical indicators. Different results may be obtained by selecting an alternative model for fitting purposes.
fmt_tra_other(MIRT_itemfit)
Item parameters serve as crucial indicators for assessing item quality. Provided below are the criteria, derived from "The Basics of Item Response Theory Using R", that can be utilized to evaluate item discrimination values. Readers can employ these criteria to evaluate the extent of item discrimination in the test under analysis.
`r bold_print("Table: Labels for Item Discrimination Values")``
| Verbal label | Range of values | Typical Value| |:------------:|:---------------:|:------------:| | None | 0 | 0.00 | | Very low | 0.01-0.34 | 0.18 | | Low | 0.35-0.64 | 0.50 | | Moderate | 0.65-1.34 | 1.00 | | High | 1.35-1.69 | 1.50 | | Very high | >1.70 | 2.00 | | Perfect | $+\infty$ | $+\infty$ |
In this analysis, the r model
model was selected. The r MIRT_est_method
was used for parameter estimation. Generally, the discrimination and difficulty are assessed using $MDISC_i$ and $\beta_i$. Therefore, the 'TestAnaAPP' transformed the $\alpha_{iq}$ and $d_i$ parameters into $MDISC_i$ and $\beta_i$ respectively. In a between-item test, $MDISC_i$ equals the absolute value of $\alpha_{iq}$ since each item only has one $\alpha_{i.}$. To obtain the $d_i$ parameter in the above formula, users can calculate $d_i$ using $\beta_i$ as follows: $d_i = -\beta_i*MDISC_{i}$.
The item parameters are as follows:
disc_color <- function(x, value = MIRTreport_alpha_highlight){ out <- rep("black", length(x)) out[x < value] <- "red" out } MIRT_itempar_print <- cbind("Item" = MIRT_itemfit[,1],MIRT_itempar) fmt_tra_other(MIRT_itempar_print) %>% flextable::color(j = colnames(MIRT_itempar_print) %>% str_which(pattern = "Discrimination"), color = disc_color)
Note: The red color indicates that the item discrimination value is below the threshold value you set on TestAnaAPP.
IRT assesses item difficulty and participant latent traits on the same scale, enabling comparability. This provides a significant advantage compared to CTT. The Wright map serves as a visual tool for comparing the distribution of item difficulty and latent traits. By comparing the scatterplot distribution of item difficulty with latent trait values, the relative difficulty or ease of the test for the subjects, as well as the difficulty level of the test, can be determined. Additionally, it enables the identification of harder and easier items.
It is important to note that the test can effectively differentiate the subjects' latent trait levels only when it possesses appropriate difficulty. Otherwise, ceiling or floor effects may arise, whereby the measurement tool inaccurately assesses the subjects' true abilities, resulting in all subjects receiving either high or low scores.
For multidimensional IRT models, the 'TestAnaAPP' displays the Wright map for each dimension on the interactive interface. You can download the selected Wright map, out of r mode$F_n
Wright maps, from the interactive interface of 'TestAnaAPP'.
print(MIRT_wright)
The Item Characteristic Curve (ICC) visualizes the probability of subjects obtaining a specific score based on the predictions of the statistical model.
It allows us to observe the probability of subjects with different levels of latent traits obtaining a specific score on each item.
Meanwhile, it provides insights into the characteristics of items for subjects with varying levels of latent traits.
Detailed information on interpreting the ICC can be found in professional books.
The 'TestAnaAPP' utilizes a specific latent trait for each item to construct the ICC figures for the between-item test, which is considered acceptable.
print(MIRT_ICC)
Item information represents the level of measurement error associated with each item. Higher values indicate a greater contribution of the item to the accurate measurement of the latent trait of the subjects. The value of item information is often related to the discriminatory power and difficulty level of the item. The Item Information Curve (IIC) visually represents the item information and allows readers to assess the contribution of the items to the accurate measurement of a specific level of latent trait.
Professional books can provide guidance on interpreting the IIC curve for those seeking to gain a comprehensive understanding.
In the between-item test, which is considered acceptable according to 'TestAnaAPP', the IIC figures are based on a single latent trait that corresponds to each item.
print(MIRT_IIC)
Test information is a quantitative measure of test accuracy in IRT, calculated by summing the information obtained from each item. A higher test information value indicates a more precise measurement of the latent trait for the subjects being assessed. 'TestAnaAPP' offers test information curves (TIC) and measurement error curves as graphical representations to demonstrate the quality of the test. A consistent conversion relationship exists between test information ($I(\theta)$) and measurement error ($SE$), which can be expressed by the following formula:
$$SE = \frac{1}{\sqrt{I(\theta)}}$$
Since the test is a between-item test, where each item measures only one of several latent traits, it is not feasible to visually represent $\theta$ in multiple dimensions. Therefore, 'TestAnaAPP' is designed to display one dimension of the TIC at a time. Below is the TIC that you have chosen to display in the user interface.
if(mode$is.within_item==TRUE){ NULL%>%print() }else{ for(i in 1:mode$F_n){ TIC_dim <- mode$F_names[i] sim_theta1_infor1 <- item_info1[,c(TIC_dim,paste0(TIC_dim,"infor"))] F_name <- TIC_dim plotrix::twoord.plot(lx = sim_theta1_infor1[,1],ly = sim_theta1_infor1[,2], rx = sim_theta1_infor1[,1],ry = 1/sqrt(sim_theta1_infor1[,2]), main = paste0("Test Information and Measurement Error of ",F_name), ylab = "Test information", rylab = "Measurement error", xlab = "Latent Trait", rcol = "red", lytickpos = seq(0, max(sim_theta1_infor1[,2]), ceiling(max(sim_theta1_infor1[,2])/5)), lylim = c(0,(max(sim_theta1_infor1[,2])+0.5)), type = c("l","p"),rpch = 1) text(x = sim_theta1_infor1[which.max(sim_theta1_infor1[,2]),1], y = max(sim_theta1_infor1[,2]) ,labels = paste(round(max(sim_theta1_infor1[,2]),3))) } }
It is important to note that users can download the item information and test information from the user interface. This allows them to create custom figures using the downloaded files.
Note: 'TestAnaAPP' is mainly designed to provide convenient item and test analysis programs for practitioners in psychological and educational measurement. The purpose of providing this report is to present the analysis results completely and clearly, and to provide universal and optimal result interpretation is not the primary objective of 'TestAnaAPP'. Therefore, while we hope users can benefit from the user-friendly nature of this program, we also encourage maintaining a cautious attitude toward the interpretation of results to avoid potential biases. Additionally, if users discover any errors in this program, we would greatly appreciate your contact with us (jiangyouxiang34@163.com).
\newpage
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.