IRT without the normality assumption"
In IRTest: Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(IRTest)
library(ggplot2)
library(gridExtra)

0. Introduction

IRTest is a useful tool for $\mathcal{\color{red}{IRT}}$ (item response theory) parameter $\mathcal{\color{red}{est}}$imation, especially when the violation of normality assumption on latent distribution is suspected.
IRTest deals with uni-dimensional latent variable.
For missing values, IRTest adopts full information maximum likelihood (FIML) approach.
In IRTest, including the conventional usage of Gaussian distribution, several methods are available for estimation of latent distribution:
- empirical histogram method,
- two-component Gaussian mixture distribution,
- Davidian curve,
- kernel density estimation,
- log-linear smoothing.

Installation

The CRAN version of IRTest can be installed on R-console with:

install.packages("IRTest")

For the development version, it can be installed on R-console with:

devtools::install_github("SeewooLi/IRTest")

Functions

Followings are the functions of IRTest.

IRTest_Dich is the estimation function when all items are dichotomously scored.
IRTest_Poly is the estimation function when all items are polytomously scored.
IRTest_Cont is the estimation function when all items are continuously scored.
IRTest_Mix is the estimation function for a mixed-format test, a test comprising both dichotomous item(s) and polytomous item(s).
factor_score estimates factor scores of examinees.
coef_se returns standard errors of item parameter estimates.
best_model selects the best model using an evaluation criterion.
item_fit tests the statistical fit of all items individually.
inform_f_item calculates the information value(s) of an item.
inform_f_test calculates the information value(s) of a test.
plot_item draws item response function(s) of an item.
reliability calculates marginal reliability coefficient of IRT.
latent_distribution returns evaluated PDF value(s) of an estimated latent distribution.
DataGeneration generates several objects that can be useful for computer simulation studies. Among these are simulated item parameters, ability parameters and the corresponding item-response data.
dist2 is a probability density function of two-component Gaussian mixture distribution.
original_par_2GM converts re-parameterized parameters of two-component Gaussian mixture distribution into original parameters.
cat_clps recommends category collapsing based on item parameters (or, equivalently, item response functions).
recategorize implements the category collapsing.
adaptive_test estimates ability parameter(s) which can be utilized for computer adaptive testing (CAT).
For S3 methods, anova, coef, logLik, plot, print, and summary are available.

\break

1. Dichotomous items

Preparing data

The function DataGeneration can be used in a preparation step. This function returns a set of artificial data and the true parameters underlying the data.

Alldata <- DataGeneration(model_D = 2,
                          N=1000,
                          nitem_D = 15,
                          latent_dist = "2NM",
                          d = 1.664,
                          sd_ratio = 2,
                          prob = 0.3)

data <- Alldata$data_D
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:15)

Analysis (parameter estimation)

Mod1 <- IRTest_Dich(data = data,
                    model = 2,
                    latent_dist = "LLS",
                    h=4)

Results

### Summary
summary(Mod1)

### Log-likelihood
logLik(Mod1)

### The estimated item parameters
coef(Mod1)

### Standard errors of the item parameter estimates
coef_se(Mod1)

### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)

### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)

The result of latent distribution estimation

plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
  lims(y = c(0, .75))+
  geom_line(
    mapping=aes(
      x=seq(-6,6,length=121), 
      y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2), 
      colour="True"),
    linewidth = 1)+
  labs(title="The estimated latent density using '2NM'", colour= "Type")+
  theme_bw()

Item response function

p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,4)
p3 <- plot_item(Mod1,8)
p4 <- plot_item(Mod1,10)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

Item fit

item_fit(Mod1)

Reliability

reliability(Mod1)

Posterior distributions for the examinees

Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in Mod1$Pk.

set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <- 
  data.frame(
    X = rep(seq(-6,6, length.out=121),6), 
    prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
    posterior = 10*c(t(Mod1$Pk[selected_examinees,])), 
    ID = rep(paste("examinee", selected_examinees), each=121)
    )

ggplot(data=post_sample, mapping=aes(x=X))+
  geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
  geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
  labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
  facet_wrap(~ID, ncol=2)+
  theme_bw()

Test information function

ggplot()+
  stat_function(
    fun = inform_f_test,
    args = list(Mod1)
  )+ 
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 1),
    mapping = aes(color="Item 1")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 2),
    mapping = aes(color="Item 2")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 3),
    mapping = aes(color="Item 3")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 4),
    mapping = aes(color="Item 4")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 5),
    mapping = aes(color="Item 5")
  )+
  lims(x=c(-6,6))+
  labs(title="Test information function", x=expression(theta), y='information')+
  theme_bw()

\break

2. Polytomous items

Preparing data

Alldata <- DataGeneration(model_P = "GRM",
                          categ = rep(c(3,7), each = 7),
                          N=1000,
                          nitem_P = 14,
                          latent_dist = "2NM",
                          d = 1.664,
                          sd_ratio = 2,
                          prob = 0.3)

data <- Alldata$data_P
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:14)

Analysis (parameter estimation)

Mod1 <- IRTest_Poly(data = data,
                    model = "GRM",
                    latent_dist = "KDE")

Results

### Summary
summary(Mod1)

### Log-likelihood
logLik(Mod1)

### The estimated item parameters
coef(Mod1)

### Standard errors of the item parameter estimates
coef_se(Mod1)

### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)

### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)

The result of latent distribution estimation

plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
  stat_function(
    fun = dist2,
    args = list(prob = .3, d = 1.664, sd_ratio = 2),
    mapping = aes(colour = "True"),
    linewidth = 1) +
  lims(y = c(0, .75)) + 
  labs(title="The estimated latent density using '2NM'", colour= "Type")+
  theme_bw()

Item response function

p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,4)
p3 <- plot_item(Mod1,8)
p4 <- plot_item(Mod1,10)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

Item fit

item_fit(Mod1)

Reliability

reliability(Mod1)

Posterior distributions for the examinees

Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in Mod1$Pk.

set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <- 
  data.frame(
    X = rep(seq(-6,6, length.out=121),6), 
    prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
    posterior = 10*c(t(Mod1$Pk[selected_examinees,])), 
    ID = rep(paste("examinee", selected_examinees), each=121)
    )

ggplot(data=post_sample, mapping=aes(x=X))+
  geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
  geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
  labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
  facet_wrap(~ID, ncol=2)+
  theme_bw()

Test information function

ggplot()+
  stat_function(
    fun = inform_f_test,
    args = list(Mod1)
  )+ 
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 1),
    mapping = aes(color="Item 1 (3 cats)")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 2),
    mapping = aes(color="Item 2 (3 cats)")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 3),
    mapping = aes(color="Item 3 (3 cats)")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 8),
    mapping = aes(color="Item 8 (7 cats)")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 9),
    mapping = aes(color="Item 9 (7 cats)")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 10, "p"),
    mapping = aes(color="Item10 (7 cats)")
  )+
  lims(x=c(-6,6))+
  labs(title="Test information function", x=expression(theta), y='information')+
  theme_bw()

\break

3. Continuous items

Statistical details of the continuous IRT

Beta distribution (click)

$$ \begin{align} f(x) &= \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \ &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \end{align} $$

$E(x)=\frac{\alpha}{\alpha+\beta}$ and $Var(x)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta=1)}$ If we reparameterize $\mu=\frac{\alpha}{\alpha+\beta}$ and $\nu=\alpha+\beta$,

$$ f(x) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))}x^{\mu\nu-1}(1-x)^{(\nu(1-\mu)-1)} $$ No Jacobian transformation required since $\mu$ and $\nu$ are parameters of the $f(x)$, not variables.

Useful equations (click)

$\psi(\bullet)$ and $\psi_1(\bullet)$ denote for digamma and trigamma functions, respectively.

$$ \begin{align} E[\log{x}] &= \int_{0}^{1}{\log{x}f(x) \,dx} \ &= \int_{0}^{1}{\log{x} \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \ &= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\log{(x)} x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \ &= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\frac{\partial x^{\alpha-1}(1-x)^{(\beta-1)}}{\partial \alpha} \,dx} \ &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial}{\partial \alpha}\int_{0}^{1}{ x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \ &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial Beta(\alpha, \beta)}{\partial \alpha} \ &= \frac{\partial \log{[Beta(\alpha, \beta)]}}{\partial \alpha} \ &= \frac{\partial \log{[\Gamma(\alpha)]}}{\partial \alpha} - \frac{\partial \log{[\Gamma(\alpha + \beta)]}}{\partial \alpha} \ &= \psi(\alpha) - \psi(\alpha+\beta) \end{align} $$

Similarly, $E[\log{(1-x)}]=\psi(\beta) - \psi(\alpha+\beta)$.

Furthermore, using $\frac{\partial Beta(\alpha,\beta)}{\partial \alpha} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)$ and $\frac{\partial^2 Beta(\alpha,\beta)}{\partial \alpha^2} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + Beta(\alpha,\beta)\left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right)$,

$$ \begin{align} E\left[(\log{x})^2\right] &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial^2 Beta(\alpha, \beta)}{\partial \alpha^2} \ &= \left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + \left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right) \end{align} $$

This leads to,

$$ \begin{align} Var\left[\log{x}\right] &= E\left[(\log{x})^2\right] - E\left[\log{x}\right]^2 \ &=\psi_1(\alpha)-\psi_1(\alpha+\beta) \end{align} $$

Continuous IRT (click)

Expected item response

$$ \mu = \frac{e^{a(\theta -b)}}{1+e^{a(\theta -b)}} \ \frac{\partial \mu}{\partial \theta} = a\mu(1-\mu) \ \frac{\partial \mu}{\partial a} = (\theta - b)\mu(1-\mu) \ \frac{\partial \mu}{\partial b} = -a\mu(1-\mu) \ \frac{\partial \mu}{\partial \nu} = 0 $$

Probability of a response

$$ f(x)=P(x|\, \theta, a, b, \nu) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))} x^{\mu\nu-1} (1-x)^{\nu(1-\mu)-1} \ $$

Gradient

$$ \log{f} = \log{\Gamma(\nu)}-\log{\Gamma(\mu\nu)}-\log{\Gamma(\nu(1-\mu))} + (\mu\nu-1)\log{x} + (\nu(1-\mu)-1) \log{(1-x)} $$

$$ \frac{\partial \log{f}}{\partial \theta} = a\nu\mu(1-\mu)\left[-\psi{(\mu\nu)}+\psi{(\nu(1-\mu))}+ \log{\left(\frac{x}{1-x}\right)}\right] $$

Information

$$ E\left[ \left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] = (a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] -2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right] +\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2 \right] $$

$$ \begin{align} E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] &= E\left[ \log{\left(x\right)^2}\right] -2 E\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right] + E\left[ \log{\left(1-x\right)^2}\right] \ &= Var\left[ \log{\left(x\right)}\right]+E\left[ \log{\left(x\right)}\right]^2 \ &\qquad -2 Cov\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right]-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \ &\qquad + Var\left[ \log{\left(1-x\right)}\right]+E\left[ \log{\left(1-x\right)}\right]^2 \ &= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +E\left[ \log{\left(x\right)}\right]^2 \ &\qquad +2 \psi_{1}(\alpha+\beta)-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \ &\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+E\left[ \log{\left(1-x\right)}\right]^2 \ &= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +\left[ \psi(\alpha)-\psi(\alpha+\beta)\right]^2 \ &\qquad +2 \psi_{1}(\alpha+\beta)-2 \left(\psi(\alpha)-\psi(\alpha+\beta)\right)\left(\psi(\beta)-\psi(\alpha+\beta)\right) \ &\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+\left[\psi(\beta)-\psi(\alpha+\beta)\right]^2 \ &= \psi_{1}(\alpha) +\psi_{1}(\beta) +\left[\psi(\alpha)-\psi(\beta)\right]^2 \end{align} $$

$$ \begin{align} E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] & = (a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] -2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right] +\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2 \right] \ &= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)
+\left[\psi(\alpha)-\psi(\beta)\right]^2 -2 \left(\psi{(\alpha)}-\psi{(\beta)}\right )\left(\psi{(\alpha)}-\psi{(\beta)}\right ) +\left(\psi{(\alpha)}-\psi{(\beta)}\right )^2 \right] \ &= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)
\right] \ \end{align} $$

$$ I(\theta) = E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] = (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)\right] $$

Log-likelihood in the M-step of the MML-EM procedure

Marginal log-likelihood of an item can be expressed as follows:

$$ \ell = \sum_{j} \sum_{q}\gamma_{jq}\log{L_{jq}}, $$

where $\gamma_{jq}=E\left[\Pr\left(\theta_j \in \theta_{q}^{}\right)\right]$ is the expected probability of respondent $j$'s ability ($\theta_j$) belonging to the $\theta_{q}^{}$ of the quadrature scheme and is calculated at the E-step of the MML-EM procedure, and $L_{jq}$ is the likelihood of respondent $j$'s response at $\theta_{q}^{*}$ for the item of current interest.

First derivative of the likelihood

$$ \frac{\partial \ell}{\partial a} = \sum_{q} \left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\left[ S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right] \right] \ \frac{\partial \ell}{\partial b} = -a\sum_{q}\nu\mu_{q}\left(1-\mu_{q}\right)\left[ S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right] \right] \ \frac{\partial \ell}{\partial \nu} = N\psi(\nu) +\sum_{q}\left[ \mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q}))) \right] $$

where $S_{1q} = \sum_{j}{\gamma_{jq}\log{x_j}}$ and $S_{2q} = \sum_{j}{\gamma_{jq}\log{(1-x_j)}}$. Since $E_q[S_{1q}]=f_q\left[\psi(\mu_{q}\nu))-\psi(\nu)\right]$ and $E_q[S_{2q}]=f_q\left[\psi(\nu(1-\mu_{q})))-\psi(\nu)\right]$, the expected values of the first derivatives are 0.

To keep $\nu$ positive, let $\nu = \exp{\xi}$; $\frac{\partial\nu}{\partial\xi}=\exp{\xi}=\nu$.

$$ \frac{\partial \ell}{\partial \xi} = N\nu\psi(\nu) +\nu\sum_{q}\left[ \mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q}))) \right] $$

Second derivative of the likelihood

$$ E\left( \frac{\partial^2\ell}{\partial a^2}\right) = -\sum_{q} \left{\left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\right}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial a \partial b}\right) = a\sum_{q} \left(\theta_{q}-b\right)\left{\nu\mu_{q}\left(1-\mu_{q}\right)\right}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial a \partial \nu}\right) = -\sum_{q} \left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial b^2}\right) = -a^{2}\sum_{q} \left{\nu\mu_{q}\left(1-\mu_{q}\right)\right}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial b \partial \nu}\right) = a\sum_{q} \nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial \nu^2}\right) = N\psi_{1}(\nu) - \sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right] $$

If we use $\xi$ instead of $\nu$,

$$ E\left(\frac{\partial^2\ell}{\partial a \partial \xi}\right) = -\sum_{q} \left(\theta_{q}-b\right)\nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial b \partial \xi}\right) = a\sum_{q} \nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \ E\left(\frac{\partial^2\ell}{\partial \xi^2}\right) = N\nu^{2}\psi_{1}(\nu) - \nu^{2}\sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right] $$

\ \

Preparing data

The function DataGeneration can be used in a preparation step. This function returns a set of artificial data and the true parameters underlying the data.

Alldata <- DataGeneration(N=1000,
                          nitem_C = 8,
                          latent_dist = "2NM",
                          a_l = .3, 
                          a_u = .7,
                          d = 1.664,
                          sd_ratio = 2,
                          prob = 0.3)

data <- Alldata$data_C
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:8)

Analysis (parameter estimation)

Mod1 <- IRTest_Cont(data = data,
                    latent_dist = "KDE")

Results

### Summary
summary(Mod1)

### Log-likelihood
logLik(Mod1)

### The estimated item parameters
coef(Mod1)

### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "WLE")
plot(theta, fscore$theta)
abline(b=1, a=0)

### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)

The result of latent distribution estimation

plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
  lims(y = c(0, .75))+
  geom_line(
    mapping=aes(
      x=seq(-6,6,length=121), 
      y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2), 
      colour="True"),
    linewidth = 1)+
  labs(title="The estimated latent density using '2NM'", colour= "Type")+
  theme_bw()

Item response function

p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,2)
p3 <- plot_item(Mod1,3)
p4 <- plot_item(Mod1,4)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

Reliability

reliability(Mod1)

Posterior distributions for the examinees

Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in Mod1$Pk.

set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <- 
  data.frame(
    X = rep(seq(-6,6, length.out=121),6), 
    prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
    posterior = 10*c(t(Mod1$Pk[selected_examinees,])), 
    ID = rep(paste("examinee", selected_examinees), each=121)
    )

ggplot(data=post_sample, mapping=aes(x=X))+
  geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
  geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
  labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
  facet_wrap(~ID, ncol=2)+
  theme_bw()

Test information function

ggplot()+
  stat_function(
    fun = inform_f_test,
    args = list(Mod1)
  )+ 
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 1),
    mapping = aes(color="Item 1")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 2),
    mapping = aes(color="Item 2")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 3),
    mapping = aes(color="Item 3")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 4),
    mapping = aes(color="Item 4")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 5),
    mapping = aes(color="Item 5")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 6),
    mapping = aes(color="Item 6")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 7),
    mapping = aes(color="Item 7")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 8),
    mapping = aes(color="Item 8")
  )+
  lims(x=c(-6,6))+
  labs(title="Test information function", x=expression(theta), y='information')+
  theme_bw()

\break

4. Mixed-format test

Preparing data

As in the cases of dichotomous and polytomous items, the function DataGeneration can be used in the preparation step. This function returns artificial data and some useful objects for analysis (i.e., theta, data_D, item_D, data_P, & item_P).

Alldata <- DataGeneration(model_D = 2,
                          model_P = "GRM",
                          N=1000,
                          nitem_D = 10,
                          nitem_P = 5,
                          latent_dist = "2NM",
                          d = 1.664,
                          sd_ratio = 1,
                          prob = 0.5)

DataD <- Alldata$data_D
DataP <- Alldata$data_P
theta <- Alldata$theta
colnames(DataD) <- paste0("item", 1:10)
colnames(DataP) <- paste0("item", 1:5)

\ \ \

Analysis (parameter estimation)

Mod1 <- IRTest_Mix(data_D = DataD,
                   data_P = DataP,
                   model_D = "2PL",
                   model_P = "GRM",
                   latent_dist = "KDE")

\ \ \

Results

### Summary
summary(Mod1)

### Log-likelihood
logLik(Mod1)

### The estimated item parameters
coef(Mod1)

### Standard errors of the item parameter estimates
coef_se(Mod1)

### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)

### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)

The result of latent distribution estimation

plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
  stat_function(
    fun = dist2,
    args = list(prob = .5, d = 1.664, sd_ratio = 1),
    mapping = aes(colour = "True"),
    linewidth = 1) +
  lims(y = c(0, .75)) + 
  labs(title="The estimated latent density using '2NM'", colour= "Type")+
  theme_bw()

Item response function

p1 <- plot_item(Mod1,1, type="d")
p2 <- plot_item(Mod1,4, type="d")
p3 <- plot_item(Mod1,8, type="d")
p4 <- plot_item(Mod1,10, type="d")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

p1 <- plot_item(Mod1,1, type="p")
p2 <- plot_item(Mod1,2, type="p")
p3 <- plot_item(Mod1,3, type="p")
p4 <- plot_item(Mod1,4, type="p")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

Item fit

item_fit(Mod1)

Reliability

reliability(Mod1)

Posterior distributions for the examinees

Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in Mod1$Pk.

set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <- 
  data.frame(
    X = rep(seq(-6,6, length.out=121),6), 
    prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
    posterior = 10*c(t(Mod1$Pk[selected_examinees,])), 
    ID = rep(paste("examinee", selected_examinees), each=121)
    )

ggplot(data=post_sample, mapping=aes(x=X))+
  geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
  geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
  labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
  facet_wrap(~ID, ncol=2)+
  theme_bw()

Test information function

ggplot()+
  stat_function(
    fun = inform_f_test,
    args = list(Mod1)
  )+ 
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 1, "d"),
    mapping = aes(color="Dichotomous Item 1")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 2, "d"),
    mapping = aes(color="Dichotomous Item 2")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 3, "d"),
    mapping = aes(color="Dichotomous Item 3")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 1, "p"),
    mapping = aes(color="Polytomous Item 1")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 2, "p"),
    mapping = aes(color="Polytomous Item 2")
  )+
  stat_function(
    fun=inform_f_item,
    args = list(Mod1, 3, "p"),
    mapping = aes(color="Polytomous Item 3")
  )+
  lims(x=c(-6,6))+
  labs(title="Test information function", x=expression(theta), y='information')+
  theme_bw()

\break

5. Model comparison

Data generation and model fitting

data <- DataGeneration(N=1000,
                       nitem_D = 10,
                       latent_dist = "2NM",
                       d = 1.664,
                       sd_ratio = 2,
                       prob = 0.3)$data_D

model_fits <- list()
model_fits[[1]] <- IRTest_Dich(data)
model_fits[[2]] <- IRTest_Dich(data, latent_dist = "EHM")
model_fits[[3]] <- IRTest_Dich(data, latent_dist = "2NM")
model_fits[[4]] <- IRTest_Dich(data, latent_dist = "KDE")
for(i in 1:10){
  model_fits[[i+4]] <- IRTest_Dich(data, latent_dist = "DC", h = i)
}

names(model_fits) <- c("Normal", "EHM", "2NM", "KDM", paste0("DC", 1:10))

The best Davidian-curve model

do.call(what = "anova", args = model_fits[5:14])
do.call(what = "best_model", args = model_fits[5:14])

The best model overall

do.call(what = "best_model", args = c(model_fits[c(1:4,5)], criterion ="AIC"))

Any scripts or data that you put into this service are public.

IRTest documentation built on Oct. 4, 2024, 5:11 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

IRTest
Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

IRT without the normality assumption"
In IRTest: Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

0. Introduction

Installation

Functions

1. Dichotomous items

2. Polytomous items

3. Continuous items

4. Mixed-format test

5. Model comparison

Try the IRTest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

IRTest Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

IRT without the normality assumption" In IRTest: Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

0. Introduction

Installation

Functions

1. Dichotomous items

2. Polytomous items

3. Continuous items

4. Mixed-format test

5. Model comparison

Try the IRTest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

IRTest
Parameter Estimation of Item Response Theory with Estimation of Latent Distribution

IRT without the normality assumption"
In IRTest: Parameter Estimation of Item Response Theory with Estimation of Latent Distribution