catIrt | R Documentation |
catIrt
simulates Computerized Adaptive Tests (CATs) given a vector/matrix of
responses or a vector of ability values, a matrix of item parameters, and several
item selection mechanisms, estimation procedures, and termination criteria.
catIrt( params, mod = c("brm", "grm"), resp = NULL, theta = NULL, catStart = list( n.start = 5, init.theta = 0, select = c("UW-FI", "LW-FI", "PW-FI", "FP-KL", "VP-KL", "FI-KL", "VI-KL", "random"), at = c("theta", "bounds"), it.range = NULL, n.select = 1, delta = .1, score = c("fixed", "step", "random", "WLE", "BME", "EAP"), range = c(-1, 1), step.size = 3, leave.after.MLE = FALSE ), catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI", "FP-KL", "VP-KL", "FI-KL", "VI-KL", "random"), at = c("theta", "bounds"), it.range = NULL, n.select = 1, delta = .1, score = c("MLE", "WLE", "BME", "EAP"), range = c(-6, 6), expos = c("none", "SH") ), catTerm = list( term = c("fixed", "precision", "info", "class"), score = c("MLE", "WLE", "BME", "EAP"), n.min = 5, n.max = 50, p.term = list(method = c("threshold", "change"), crit = .25), i.term = list(method = c("threshold", "change"), crit = 2), c.term = list(method = c("SPRT", "GLR", "CI"), bounds = c(-1, 1), categ = c(0, 1, 2), delta = .1, alpha = .05, beta = .05, conf.lev = .95) ), ddist = dnorm, progress = TRUE, ... ) ## S3 method for class 'catIrt' summary( object, group = TRUE, ids = "none", ... ) ## S3 method for class 'catIrt' plot( x, which = "all", ids = "none", conf.lev = .95, legend = TRUE, ask = TRUE, ... )
object, x |
a |
params |
numeric: a matrix of item parameters. If specified as a matrix,
the rows must index the items, and the columns must designate the item
parameters. For the binary response model, |
mod |
character: a character string indicating the IRT model. Current support
is for the 3-parameter binary response model ("brm"),
and Samejima's graded response model ("grm"). The contents
of |
resp |
numeric: either a N \times J matrix (where N indicates the
number of simulees and J indicates the number of items), a J
length vector (if there is only one simulee), or NULL if specifying |
theta |
numeric: either a N-dimensional vector (where N indicates the
number of simulees) or NULL if specifying |
catStart |
list: a list of options for starting the CAT including:
|
catMiddle |
list: a list of options for selecting/scoring during the middle of the CAT, including:
|
catTerm |
list: a list of options for stopping/terminating the CAT, including:
|
ddist |
function: a function indicating how to calculate prior densities
for Bayesian estimation or particular item selection methods. For instance,
if you wish to specify a normal prior, |
which |
numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows:
|
group |
logical: TRUE or FALSE indicating whether to display a summary at the group level. |
ids |
numeric: a scalar or vector of integers between 1 and the number of
simulees indicating which simulees to plot and/or summarize their CAT
process and all of their θ estimates. |
conf.lev |
numeric: a scalar between 0 and 1 indicating the desired confidence level plotted for the individual θ estimates. |
legend |
logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot. |
ask |
logical: TRUE or FALSE indicating whether the plot function should ask between plots. |
progress |
logical: TRUE or FALSE indicating whether the |
... |
arguments passed to |
The function catIrt
performs a post-hoc computerized adaptive test (CAT),
with a variety of user specified inputs. For a given person/simulee (e.g. simulee i),
a CAT represents a simple set of stages surrounded by a while
loop
(e.g. Weiss and Kingsbury, 1984):
Item Selection: The next item is chosen based on a pre-specified criterion/criteria.
For example, the classic item selection mechanism is picking an item such that it
maximizes Fisher Information at the current estimate of θ_i. Frequently,
content balancing, item constraints, or item exposure will be taken into consideration
at this point (aside from solely picking the "best item" for a given person).
See itChoose
for current item selection methods.
Estimation: θ_i is estimated based on updated information, usually relating
to the just-selected item and the response associated with that item. In
a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration"
would be between "item selection" and "estimation." The classic estimation mechanism
is estimating θ_i based off of maximizing the likelihood given parameters and a set
of responses. Other estimation mechanisms correct for bias in the maximum likelihood
estimate or add a prior information (such as a prior distribution of θ).
If an estimate is untenable (i.e. it returns a non-sensical value or ∞), the estimation
procedure needs to have an alternative estimation mechanism. See mleEst
for
current estimation methods.
Termination: Either the test is terminated based on a pre-specified criterion/critera,
or no termination criteria is satisfied, in which case the loop repeats. The standard
termination criteria involve a fixed criterion (e.g. administering only 50 items),
or a variable criterion (e.g. continuing until the observed SEM is below .3). Other
termination criteria relate to cut-point tests (e.g. certification tests, classification tests),
that depend not solely on ability but on whether that ability is estimated to exceed a threshold.
catIrt
terminates classification tests based on either the Sequential Probability Ratio Test
(SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the
Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio
of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood
of the data given being in the other category, as defined by B + δ and
B - δ (where B separates the categories and δ is the halfwidth of the
indifference region) and compares that ratio with a ratio of error rates (α and
β) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of either
B + δ or B - δ, and the confidence interval method terminates a CAT if the
confidence interval surrounding an estimate of θ is fully within one of the categories.
The CAT estimates θ_{i1} (an initial point) based on init.theta
,
and terminates the entire simulation after sequentially terminating each simulee's CAT.
The function catIrt
returns a list (of class "catIrt") with the following elements:
cat_theta |
a vector of final CAT θ estimates. |
cat_categ |
a vector indicating the final classification of each simulee in the CAT. If
|
cat_info |
a vector of observed Fisher information based on the final CAT θ estimates and the item responses. |
cat_sem |
a vector of observed SEM estimates (or posterior standard deviations) based on the final CAT θ estimates and the item responses. |
cat_length |
a vector indicating the number of items administered to each simulee in the CAT |
cat_term |
a vector indicating how each CAT was terminated. |
tot_theta |
a vector of θ estimates given the entire item bank. |
tot_categ |
a vector indicating the classification of each simulee given the entire item bank. |
tot_info |
a vector of observed Fisher information based on the entire item bank worth of responses. |
tot_sem |
a vector of observed SEM estimates based on the entire item bank worth of responses. |
true_theta |
a vector of true θ values if specified by the user. |
true_categ |
a vector of true classification given θ. |
full_params |
the full item bank. |
full_resp |
the full set of responses. |
cat_indiv |
a list of θ estimates, observed SEM, observed information, the responses and the parameters chosen for each simulee over the entire CAT. |
mod |
a list of model specifications, as designated by the user, so that the CAT can be easily reproduced. |
Both summary.catIrt
and plot.catIrt
return different objects than the original
catIrt
function. summary.catIrt
returns summary labeled summary statistics, and
plot.catIrt
returns evaluation points (x values, information, and SEM) for each
of the plots. Moreover, if in interactive mode and missing parts of the catStart
, catMiddle
,
or catTerm
arguments, the catIrt
function will interactively ask for each of those
and return the set of arguments in the "catIrt" object.
Steven W. Nydick swnydick@gmail.com
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 – 261.
Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257–283). New York, NY: Academic Press.
Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.
Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: Lawrence Erlbaum Associates.
Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 – 186.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
FI
, itChoose
, KL
, mleEst
,
simIrt
## Not run: ######################### # Binary Response Model # ######################### set.seed(888) # generating random theta: theta <- rnorm(50) # generating an item bank under a 2-parameter binary response model: b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0) # simulating responses: b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp ## CAT 1 ## # the typical, classic post-hoc CAT: catStart1 <- list(init.theta = 0, n.start = 5, select = "UW-FI", at = "theta", n.select = 4, it.range = c(-1, 1), score = "step", range = c(-1, 1), step.size = 3, leave.after.MLE = FALSE) catMiddle1 <- list(select = "UW-FI", at = "theta", n.select = 1, it.range = NULL, score = "MLE", range = c(-6, 6), expos = "none") catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50) cat1 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart1, catMiddle = catMiddle1, catTerm = catTerm1) # we can print, summarize, and plot: cat1 # prints theta because # we have fewer than # 200 simulees summary(cat1, group = TRUE, ids = "none") # nice summary! summary(cat1, group = FALSE, ids = 1:4) # summarizing people too! :) par(mfrow = c(2, 2)) plot(cat1, ask = FALSE) # 2-parameter model, so expected FI # and observed FI are the same par(mfrow = c(1, 1)) # we can also plot particular simulees: par(mfrow = c(2, 1)) plot(cat1, which = "none", ids = c(1, 30), ask = FALSE) par(mfrow = c(1, 1)) ## CAT 2 ## # using Fixed Point KL info rather than Unweighted FI to select items: catStart2 <- catStart1 catMiddle2 <- catMiddle1 catTerm2 <- catTerm1 catStart2$leave.after.MLE <- TRUE # leave after mixed response pattern catMiddle2$select <- "FP-KL" catMiddle2$at <- "bounds" catMiddle2$delta <- .2 catTerm2$c.term <- list(bounds = 0) cat2 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart2, catMiddle = catMiddle2, catTerm = catTerm2) cor(cat1$cat_theta, cat2$cat_theta) # very close! summary(cat2, group = FALSE, ids = 1:4) # rarely 5 starting items! ## CAT 3/4 ## # using "precision" rather than "fixed" to terminate: catTerm1$term <- catTerm2$term <- "precision" catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3) cat3 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart1, catMiddle = catMiddle1, catTerm = catTerm1) cat4 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart2, catMiddle = catMiddle2, catTerm = catTerm2) mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items ## CAT 5/6 ## # classification CAT with a boundary of 0 (with default classification stuff): catTerm5 <- list(term = "class", n.min = 10, n.max = 50, c.term = list(method = "SPRT", bounds = 0, delta = .2, alpha = .10, beta = .10)) cat5 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart1, catMiddle = catMiddle1, catTerm = catTerm5) cat6 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart1, catMiddle = catMiddle2, catTerm = catTerm5) # how many were classified correctly? mean(cat5$cat_categ == cat5$tot_categ) # using a different selection mechanism, we get the similar results: mean(cat6$cat_categ == cat6$tot_categ) ## CAT 7 ## # we could change estimation to EAP with the default (normal) prior: catMiddle7 <- catMiddle1 catMiddle7$score <- "EAP" cat7 <- catIrt(params = b.params, mod = "brm", # much slower! resp = b.resp, catStart = catStart1, catMiddle = catMiddle7, catTerm = catTerm1) cor(cat1$cat_theta, cat7$cat_theta) # pretty much the same ## CAT 8 ## # let's specify the prior as something strange: cat8 <- catIrt(params = b.params, mod = "brm", resp = b.resp, catStart = catStart1, catMiddle = catMiddle7, catTerm = catTerm1, ddist = dchisq, df = 4) cat8 # all positive values of "theta" ## CAT 9 ## # finally, we can have: # - more than one termination criteria, # - individual bounds per person, # - simulating based on theta without a response matrix. catTerm9 <- list(term = c("fixed", "class"), n.min = 10, n.max = 50, c.term = list(method = "SPRT", bounds = cbind(runif(length(theta), -1, 0), runif(length(theta), 0, 1)), delta = .2, alpha = .1, beta = .1)) cat9 <- catIrt(params = b.params, mod = "brm", resp = NULL, theta = theta, catStart = catStart1, catMiddle = catMiddle1, catTerm = catTerm9) summary(cat9) # see "... with Each Termination Criterion" ######################### # Graded Response Model # ######################### # generating random theta theta <- rnorm(201) # generating an item bank under a graded response model: g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100), b3 = rnorm(100), b4 = rnorm(100)) # the graded response model is exactly the same, only slower! cat10 <- catIrt(params = g.params, mod = "grm", resp = NULL, theta = theta, catStart = catStart1, catMiddle = catMiddle1, catTerm = catTerm1) # warning because it.range cannot be specified for graded response models! # if there is more than 200 simulees, it doesn't print individual thetas: cat10 ## End(Not run) # play around with things - CATs are fun - a little frisky, but fun.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.