evaluate_mfrm_signal_detection: Evaluate DIF power and bias-screening behavior under known...
In mfrmr: Estimation and Diagnostics for Many-Facet Measurement Models

evaluate_mfrm_signal_detection

R Documentation

Evaluate DIF power and bias-screening behavior under known simulated signals

Description

Evaluate DIF power and bias-screening behavior under known simulated signals

Usage

evaluate_mfrm_signal_detection(
  n_person = c(30, 50, 100),
  n_rater = c(4),
  n_criterion = c(4),
  raters_per_person = n_rater,
  design = NULL,
  reps = 10,
  group_levels = c("A", "B"),
  reference_group = NULL,
  focal_group = NULL,
  dif_level = NULL,
  dif_effect = 0.6,
  bias_rater = NULL,
  bias_criterion = NULL,
  bias_effect = -0.8,
  score_levels = 4,
  theta_sd = 1,
  rater_sd = 0.35,
  criterion_sd = 0.25,
  noise_sd = 0,
  step_span = 1.4,
  fit_method = c("JML", "MML"),
  model = c("RSM", "PCM", "GPCM"),
  step_facet = NULL,
  slope_facet = NULL,
  slopes = NULL,
  maxit = 25,
  quad_points = 7,
  residual_pca = c("none", "overall", "facet", "both"),
  sim_spec = NULL,
  dif_method = c("residual", "refit"),
  dif_min_obs = 10,
  dif_p_adjust = "holm",
  dif_p_cut = 0.05,
  dif_abs_cut = 0.43,
  bias_max_iter = 2,
  bias_p_cut = 0.05,
  bias_abs_t = 2,
  seed = NULL
)

Arguments

`n_person`	Vector of person counts to evaluate.
`n_rater`	Vector of rater counts to evaluate.
`n_criterion`	Vector of criterion counts to evaluate.
`raters_per_person`	Vector of rater assignments per person.
`design`	Optional named design-grid override supplied as a named list, named vector, or one-row data frame. Names may use canonical variables (`n_person`, `n_rater`, `n_criterion`, `raters_per_person`), current public aliases implied by `sim_spec` (for example `n_judge`, `n_task`, `judge_per_person`), or role keywords (`person`, `rater`, `criterion`, `assignment`). Values may be vectors. The schema-only future branch input `design$facets = c(person = ..., judge = ..., task = ...)` is also accepted for the currently exposed facet keys. Do not specify the same variable through both `design` and the scalar design-grid arguments.
`reps`	Number of replications per design condition.
`group_levels`	Group labels used for DIF simulation. The first two levels define the default reference and focal groups.
`reference_group`	Optional reference group label used when extracting the target DIF contrast.
`focal_group`	Optional focal group label used when extracting the target DIF contrast.
`dif_level`	Target criterion level for the true DIF effect. Can be an integer index or a criterion label such as `"C04"`. Defaults to the last criterion level in each design.
`dif_effect`	True DIF effect size added to the focal group on the target criterion.
`bias_rater`	Target rater level for the true interaction-bias effect. Can be an integer index or a label such as `"R04"`. Defaults to the last rater level in each design.
`bias_criterion`	Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design.
`bias_effect`	True interaction-bias effect added to the target `⁠Rater x Criterion⁠` cell.
`score_levels`	Number of ordered score categories.
`theta_sd`	Standard deviation of simulated person measures.
`rater_sd`	Standard deviation of simulated rater severities.
`criterion_sd`	Standard deviation of simulated criterion difficulties.
`noise_sd`	Optional observation-level noise added to the linear predictor.
`step_span`	Spread of step thresholds on the logit scale.
`fit_method`	Estimation method passed to `fit_mfrm()`.
`model`	Measurement model passed to `fit_mfrm()`. Bounded `GPCM` is supported with caveats as slope-aware signal-detection sensitivity evidence.
`step_facet`	Step facet passed to `fit_mfrm()` when `model = "PCM"` or `model = "GPCM"`. When left `NULL`, the function inherits the generator step facet from `sim_spec` when available and otherwise defaults to `"Criterion"`.
`slope_facet`	Slope facet passed to `fit_mfrm()` when `model = "GPCM"`. Defaults to the fitted step facet.
`slopes`	Optional bounded-`GPCM` slope specification used by direct simulation calls when `sim_spec = NULL`.
`maxit`	Maximum iterations passed to `fit_mfrm()`.
`quad_points`	Quadrature points for `fit_method = "MML"`.
`residual_pca`	Residual PCA mode passed to `diagnose_mfrm()`.
`sim_spec`	Optional output from `build_mfrm_sim_spec()` or `extract_mfrm_sim_spec()` used as the base data-generating mechanism. When supplied, the design grid still varies `n_person`, `n_rater`, `n_criterion`, and `raters_per_person`, but latent spread, thresholds, and other generator settings come from `sim_spec`. The target DIF and interaction-bias signals specified in this function override any signal tables stored in `sim_spec`. If `sim_spec` stores an active latent-regression population generator, this helper currently requires `fit_method = "MML"` so each replication can refit the population model.
`dif_method`	Differential-functioning method passed to `analyze_dff()`.
`dif_min_obs`	Minimum observations per group cell for `analyze_dff()`.
`dif_p_adjust`	P-value adjustment method passed to `analyze_dff()`.
`dif_p_cut`	P-value cutoff for counting a target DIF detection.
`dif_abs_cut`	Optional absolute contrast cutoff used when counting a target DIF detection. When omitted, the effective default is `0.43` for `dif_method = "refit"` and `0` (no additional magnitude cutoff) for `dif_method = "residual"`.
`bias_max_iter`	Maximum iterations passed to `estimate_bias()`.
`bias_p_cut`	P-value cutoff for counting a target bias screen-positive result.
`bias_abs_t`	Absolute t cutoff for counting a target bias screen-positive result.
`seed`	Optional seed for reproducible replications.

Details

This function performs Monte Carlo design screening for two related tasks: DIF detection via analyze_dff() and interaction-bias screening via estimate_bias().

For each design condition (combination of n_person, n_rater, n_criterion, raters_per_person), the function:

Generates synthetic data with simulate_mfrm_data()
Injects one known Group \times Criterion DIF effect (dif_effect logits added to the focal group on the target criterion)
Injects one known Rater \times Criterion interaction-bias effect (bias_effect logits)
Fits and diagnoses the MFRM
Runs analyze_dff() and estimate_bias()
Records whether the injected signals were detected or screen-positive

Bounded-GPCM runs preserve the current package constraint slope_facet == step_facet within the generator and fitted model. The resulting DIF and bias rates are slope-aware screening summaries, not formal inferential power, alpha calibration, operational scoring, or arbitrary-facet planning evidence.

Detection criteria: A DIF signal is counted as "detected" when the target contrast has p < dif_p_cut and, when an absolute contrast cutoff is in force, |\mathrm{Contrast}| \ge dif_abs_cut. For dif_method = "refit", dif_abs_cut is interpreted on the logit scale. For dif_method = "residual", the residual-contrast screening result is used and the default is to rely on the significance test alone.

Bias results are different: estimate_bias() reports t and Prob. as screening metrics rather than formal inferential quantities. Here, a bias cell is counted as screen-positive only when those screening metrics are available and satisfy

p < bias_p_cut and |t| \ge bias_abs_t.

Power is the proportion of replications in which the target signal was correctly detected. For DIF this is a conventional power summary. For bias, the primary summary is BiasScreenRate, a screening hit rate rather than formal inferential power.

False-positive rate is the proportion of non-target cells that were incorrectly flagged. For DIF this is interpreted in the usual testing sense. For bias, BiasScreenFalsePositiveRate is a screening rate and should not be read as a calibrated inferential alpha level.

Default effect sizes: dif_effect = 0.6 logits corresponds to a moderate criterion-linked differential-functioning effect; bias_effect = -0.8 logits represents a substantial rater-criterion interaction. Adjust these to match the smallest effect size of practical concern for your application.

This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.

If you want to approximate a real study, choose the design grid and simulation settings so that they reflect the empirical context of interest. For example, you may set n_person, n_rater, n_criterion, raters_per_person, and the latent-spread arguments to values motivated by an existing assessment program, then study how operating characteristics change as those design settings vary.

When sim_spec is supplied, the function uses it as the explicit data-generating mechanism for the latent spreads, thresholds, and assignment archetype, while still injecting the requested target DIF and bias effects for each design condition.

If that specification also stores a latent-regression population generator, each replication carries simulated one-row-per-person background data into the MML fit. This remains a screening-oriented Monte Carlo study; it is not a person-level posterior prediction for one observed sample.

Value

An object of class mfrm_signal_detection with:

design_grid: evaluated design conditions. When sim_spec carries custom public facet names, matching design-variable alias columns are included alongside the canonical internal columns.
results: replicate-level detection results, with the same design-variable alias columns when applicable.
rep_overview: run-level status and timing, with the same design-variable alias columns when applicable.
design_descriptor: role-based design-variable metadata used by planning summaries and plots
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of which design variables remain mutable under the current simulation specification
planning_schema: combined planner-schema contract bundling the role descriptor, scope boundary, and current mutability map
gpcm_boundary: bounded-GPCM caveat row when a GPCM screening route is used
settings: signal-analysis settings
ademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)
notes: short interpretation notes

References

The simulation logic follows the general Monte Carlo / operating-characteristic framework described by Morris, White, and Crowther (2019) and the ADEMP-oriented planning/reporting guidance summarized for psychology by Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a many-facet screening helper specialized to DIF and interaction-bias use cases; it is not a direct implementation of one published many-facet Rasch simulation design.

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.

Examples


sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection(
  design = list(person = 8, rater = 2, criterion = 2, assignment = 1),
  reps = 1,
  maxit = 30,
  bias_max_iter = 1,
  seed = 123
))
s_sig <- summary(sig_eval)
s_sig$overview

mfrmr documentation built on June 13, 2026, 1:07 a.m.