Computerized Adaptive Testing (CAT) simulation program
Computerized adaptive testing (CAT) aims to enhance efficiency and precision in measurement by optimizing tests for individual examinees. Optimization is accomplished by administering test items that are appropriate for a given individual examinee. Measurement efficiency is gained by administering fewer but more informative items to achieve a given level of precision. Thus, the test length can be reduced (should the test length be allowed to vary) without a loss of measurement precision. With gained efficiency and precision also comes an improved test taking experiance due to alleviated testing-time burden. Another important objective of CAT becoming more relevant in recent years is the capability to administer more complex and innovative item types and formats granting multiple score points.
Firestar (Choi, 2009) was developed to provide an open-source platform to systematically evaluate item banks, select items for static forms, determine optimal CAT environment settings, and compare measurement efficiency of CAT to that of static forms (Choi et al., 2010). Having access to open-source CAT algorithms, which do not utilize proprietary optimization routines, can not only be pedagogically instructive but also facilitate the understanding and advancement of the science and research behind CAT.
The first rendition of Firestar (Choi, 2009) was developed for simulating CAT with polytomous items. The number of response categories can vary across items. The item response theory (IRT) models supported by the program include Samejima’s (1969) graded response model (GRM) and Muraki’s (1992) generalized partial credit model (GPCM). Both Masters’ (1982) partial credit model (PCM) and Andrich’s (1978) rating scale model (RSM) can be supported as special cases of the GPCM. For the PCM, the slope parameters for all items should be set to 1.0 or a common value (a positive real number). For the RSM, the threshold parameters for all items and each item location parameter (scale value) can be combined and expressed as the PCM parameters. The program also supports dichotomous items. That is, the two-parameter logistic model (Birnbaum, 1968) is supported as a special case of the GRM or the GPCM for two (ordered) categories. The Rasch model (Rasch, 1960) was also supported but only as a special case of the two-parameter logistic model with the slope parameter set to 1.0.
Firestar-D (Choi et al., 2012) was developed subsequently using the same platform as its predecessor specifically for the three-parameter IRT model (Birnbaum, 1968). Firestar and Firestar-D provide traditional item selection methods, interim and final theta estimators, and an array of optional output files for secondary analyses. The program also supports a range of alternatives for item selection (van der Linden, 1998; van der Linden & Pashley, 2000), content balancing (Kingsbury & Zara, 1989), exposure control (Revuelta & Ponsoda, 1998), provisional and final theta estimation, stopping, and output control.
Firestar has been implemented in operational CAT programs (Cella et al., 2007) and is currently being used in health outcomes and clinical research studies across the nation (www.NIHPROMIS.org). Built on an open-source platform, Firestar engines are completely exposed to facilitate collaborations in the research community for further validation and enhancement.
The current version of Firestar
consolidates both Firestar (Choi, 2009) and Firestar-D (Choi, et al., 2012) into a single R
package, leveraging some of the S4
classes and Rcpp
functions available in the TestDesign
package. As a result, it can handle item pools with any mixture of the IRT models mentioned above.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.