jrt: Fit ordinal IRT models on judgment data and return factor...

Description Usage Arguments Value References Examples


This function automatically selects appropriate polytomous IRT models based on an information criterion (e.g. Corrected AIC), then returns factor scores, standard errors and various IRT psychometric information, as well as more traditionnal ("CTT") psychometric information. All IRT estimation procedures are executed with the package mirt (Chalmers, 2012). The non-IRT procedures use packages psych and irr.


jrt(data, irt.model = "auto", summary = T,
  selection.criterion = "AICC", response.categories = "auto",
  remove.judges.with.unobserved.categories = F, additional.stats = F,
  method.factor.scores = "EAP", return.mean.scores = T,
  prefix.for.outputs = "Judgments", column.names = "Judge",
  maximum.iterations = 2000, convergence.threshold = 0.001,
  estimation.algorithm = "EM", status.verbose = F,
  estimation.package.warnings = F, digits = 3, plots = T,
  greyscale = F, progress.bar = T, method.item.fit = "X2",
  select.variables.that.contain = NULL, silent = F, show.calls = F,
  debug = F)



A dataframe or matrix including the judgments to be scored. Note that so far missing data are not supported. This is the only required argument for the function.


A string value with the name of the model to fit. It can be:

  • "auto" (default) or NULL : Empirically select the model based on an information criterion (see selection.criterion argument).

    Difference models (more or less constrained versions of the Graded Response Model)

    • "GRM": Graded Response Model

    • "CGRM": Constrained Graded Response Model (equal discriminations)

    • "GrRSM": Graded Rating Scale Model (same category structures)

    • "CGrRSM": Constrained Graded Rating Scale Model (same category structures and equal discriminations)

    Divide-by-total models (more or less constrained versions of the Generalized Partial Credit Model)

    • "GPCM": Generalized Partial Credit Model

    • "PCM": Partial Credit Model (equal discriminations)

    • "GRSM": Generalized Rating Scale Model (same category structures)

    • "RSM": Rating Scale Model (same category structures and equal discriminations)

For convenience, models can also be called by their full names (e.g. "Generalized Rating Scale Model" or "Generalized Rating Scale" work.)

  • Note: Models where judges are constrained to same category structures (Graded Rating Scale Model, Constrained Graded Rating Scale Model, Generalized Rating Scale Model and Rating Scale Model) cannot be fit if judges have different observed categories. Judges with unobserved categories are automatically removed if these models are called. If the automatic model selection is used, these models are ignored in the comparison by default, but this behavior can be modified to removing judges in the comparison with remove.judges.with.unobserved.categories = T.


A logical to indicate if summary statistics should be displayed as messages (default is TRUE).


A string with the criterion for the automatic selection. The default is the Akaike Information Criterion corrected (AICC), but other criteria may be used (AIC, BIC and SABIC).


A numeric vector to indicate the possible score values. For example, use 1:7 for a Likert-type score from 1 to 7. The default, auto automatically detects the possible values based on the dataset provided.


A logical value to indicate whether to only keep the judges with all categories observed (based on the response.categories argument). The Rating Scale Model (RSM) and Graded Rating Scale Model (GRSM) can only be estimated if the same categories are observed for all judges. If set to TRUE, "incomplete judges" are removed only to fit models that require it (RSM and GRSM), and for other models when they are compared to them (to allow meaningful model comparisons). It defaults to FALSE to keep all the data available, and has no effect if models that do not require "complete judges" are called.


A logical to indicate whether to report other ("non-IRT") reliability statistics (based on computations from packages 'psych' and 'irr'). Defaults to FALSE.


A string to indicate the method used to compute the factor scores. Bayesian methods (EAP, MAP) are recommended. Defaults to Expected A Posteriori (EAP) based on a Standard Normal N(0,1) prior distribution. Alternatively, Maximum A Posteriori (MAP) with a Standard Normal N(0,1) prior may be used. Maximum Likelihood (ML) is also possible (it is equivalent to using a uniform prior), but it is discouraged as can produce -Inf and +Inf factor scores (for which standard errors will be missing). Alternatively, Weighted Likelihood Estimation (WLE) may be used.


A logical to indicate whether to return the mean scores in the output (defaults to TRUE).


A character used as prefix to name the vectors in the output data frames. Default is "Judgments".


A character to indicate the preferred name to give to a Judge. Defaults to "Judge".


A numeric indicating the maximum number of iterations used to fit the model (default is 2000).


A numeric to indicate the threshold used to tolerate convergence (default is .001). Reduce for increased precision (but slower or non convergent results).


A string indicating the estimation algorithm. Can notably be EM for Bock and Atkin's Expected-Maximization (default) or MHRM for the Metropolis-Hastings Robbins-Monro algorithm (usually slower for unidimensional models).


A logical to indicate whether to output messages indicating what the package is doing. Defaults to FALSE.


A logical to indicate whether to output the warnings and messages of the estimation package. Defaults to FALSE for a cleaner output, but set to TRUE if experiencing issues with the estimation.


A numeric to indicate the number of digits to round output statistics by (default is 3).


A logical to indicate whether to plot the total information plot and judge category curves (TRUE, default) or not (FALSE).


A logical to indicate whether the plots should be in greyscale (TRUE) or color (FALSE, default).


A logical to indicate whether to show a progress bar during the automatic model selection. Defaults to TRUE.


A character value to indicate which fit statistic to use for the item fit output. Passed to the itemfit function of the mirt pacakge. Can be S_X2, Zh, X2, G2, PV_Q1, PV_Q1, X2*, X2*_df, infit. Note that some are not be computable if there are missing data.


A character string to use as data the variables in the original dataset that contain the string. Based on the select function of dplyr. For example, if all your judgment data includes "Rater", use "Rater" to filter your dataset here.


A logical (defaults to FALSE) to ask no output (no message or plot) but the jrt object. This uses other parameters (progress.bar, estimation.package.warnings, plots, summary) in order to return a silent output. Useful if only using the package for factor scoring, for example.


A logical to report the calls made to fit the different models. This is meant as a didactic options for users who may be interested in switching over to mirt directly. Defaults to FALSE.


A logical to report debug messages (used in development). Defaults to FALSE.


An object of S4-class jrt. The factor scores can be accessed in slot @output.data.


Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06

Myszkowski, N., & Storme, M. (2019). Judge Response Theory? A call to upgrade our psychometrical account of creativity judgments. Psychology of Aesthetics, Creativity and the Arts, 13(2), 167-175. doi: 10.1037/aca0000225


# Load dataset
data <- jrt::ratings

# Fit models
fit <- jrt(data,
  irt.model = "GRM", # to manually select a model
  plots = FALSE) # to remove plots

# Extract the factor scores
fit@factor.scores # In a dataframe with standard errors
fit@factor.scores.vector # As a numeric vector

# See vignette for more options

jrt documentation built on May 6, 2019, 5:01 p.m.