| q3_statistic | R Documentation |
Computes a Q3-style index inspired by Yen (1984) – the Pearson
correlation of standardized residuals between every pair of
levels of a chosen facet – from a diagnose_mfrm() bundle. Under
the conditional-independence assumption of the MFRM, |Q3| should be
small for every pair; large absolute values flag pairs of facet
elements (e.g. two raters or two items) whose residuals co-move
more than the main-effects model expects.
q3_statistic(
fit,
diagnostics = NULL,
facet = "Rater",
min_pairs = 5L,
yen_threshold = 0.2,
marais_threshold = 0.3,
relative_offset = 0.2
)
fit |
An |
diagnostics |
Optional |
facet |
Facet whose levels are paired (default |
min_pairs |
Minimum number of shared response opportunities
required to retain a pair. Pairs below the threshold drop out
of the table (mirrors |
yen_threshold |
Community-convention flag threshold (default
|
marais_threshold |
Stricter community-convention threshold
(default |
relative_offset |
Screening offset for the relative-flag rule
|
An object of class mfrm_q3 containing:
pairsA data frame with one row per facet-level pair
and columns Level1, Level2, Q3, N, AbsQ3,
YenFlag, MaraisFlag, RelativeFlag, and a textual
Interpretation summarising which thresholds were exceeded.
summaryOne-row tibble with MeanQ3, MaxAbsQ3,
and the three flagged-pair counts.
thresholdsThe thresholds used, for reproducibility.
facetThe facet whose levels were paired.
This implementation differs from Yen's (1984) original definition in two respects that together affect threshold interpretation.
(1) Standardized vs raw residuals. Yen (1984, eqs. 7-8, p. 127)
defines Q3 = cor(d_i, d_j) where d_{ik} = u_{ik} - P_hat_{ik} is
the raw residual. mfrmr uses standardized residuals
Z = (u - P_hat) / sqrt(Var(u)) because that is what
diagnose_mfrm() stores. Standardization down-weights high-variance
observations and changes the sampling distribution of the resulting
correlation; the published critical values (Chen & Thissen, 1997;
Christensen et al., 2017) were derived for raw-residual Q3.
(2) Mean-aggregation. When the facet being paired (e.g. Rater)
has multiple residual rows per (Person, Level) cell because of
additional facets in the design (e.g. multiple Criterion rows per
Person-Rater cell), the standardized residuals are first
mean-aggregated to one value per (Person, Level) cell, and the
Pearson correlation is taken over those mean-aggregated residuals.
Yen's original formulation takes the correlation directly over
per-(Person, Item) residuals, without aggregation. Mean-aggregation
reduces noise but also shrinks the effective sample size and can pull
correlations toward the cell mean.
For both reasons, treat the values returned here as a screening summary rather than a direct substitute for the published Q3 thresholds. For a formal local-dependence test under raw-residual Q3, use a parametric bootstrap as recommended by Christensen et al. (2017).
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/014662168400800201")}
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for
item pairs using item response theory. Journal of Educational
and Behavioral Statistics, 22(3), 265-289. (Origin of the
commonly cited |Q3| > 0.20 cutoff.)
Marais, I. (2013). Local dependence. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch models in health (pp. 111-130). London: ISTE / Wiley.
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen's Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/0146621616677520")}
plot_local_dependence_heatmap(), diagnose_mfrm()
toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
method = "JML", maxit = 30)
q3 <- q3_statistic(fit)
q3$summary
# Look for: MaxAbsQ3 < 0.20 (Chen & Thissen 1997 community cutoff) is
# the comfortable regime; values above 0.30 are commonly considered
# strict-flag worthy (Marais, 2013, summarising literature). For a
# formal test, use a parametric bootstrap (Christensen et al., 2017).
# The summary's flag counts give a quick triage; inspect `q3$pairs`
# for the offending level pairs and follow up with content review.
head(q3$pairs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.