View source: R/api-simulation.R
| evaluate_mfrm_signal_detection | R Documentation |
Evaluate DIF power and bias-screening behavior under known simulated signals
evaluate_mfrm_signal_detection(
n_person = c(30, 50, 100),
n_rater = c(4),
n_criterion = c(4),
raters_per_person = n_rater,
reps = 10,
group_levels = c("A", "B"),
reference_group = NULL,
focal_group = NULL,
dif_level = NULL,
dif_effect = 0.6,
bias_rater = NULL,
bias_criterion = NULL,
bias_effect = -0.8,
score_levels = 4,
theta_sd = 1,
rater_sd = 0.35,
criterion_sd = 0.25,
noise_sd = 0,
step_span = 1.4,
fit_method = c("JML", "MML"),
model = c("RSM", "PCM"),
step_facet = NULL,
maxit = 25,
quad_points = 7,
residual_pca = c("none", "overall", "facet", "both"),
sim_spec = NULL,
dif_method = c("residual", "refit"),
dif_min_obs = 10,
dif_p_adjust = "holm",
dif_p_cut = 0.05,
dif_abs_cut = 0.43,
bias_max_iter = 2,
bias_p_cut = 0.05,
bias_abs_t = 2,
seed = NULL
)
n_person |
Vector of person counts to evaluate. |
n_rater |
Vector of rater counts to evaluate. |
n_criterion |
Vector of criterion counts to evaluate. |
raters_per_person |
Vector of rater assignments per person. |
reps |
Number of replications per design condition. |
group_levels |
Group labels used for DIF simulation. The first two levels define the default reference and focal groups. |
reference_group |
Optional reference group label used when extracting the target DIF contrast. |
focal_group |
Optional focal group label used when extracting the target DIF contrast. |
dif_level |
Target criterion level for the true DIF effect. Can be an
integer index or a criterion label such as |
dif_effect |
True DIF effect size added to the focal group on the target criterion. |
bias_rater |
Target rater level for the true interaction-bias effect.
Can be an integer index or a label such as |
bias_criterion |
Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design. |
bias_effect |
True interaction-bias effect added to the target
|
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
fit_method |
Estimation method passed to |
model |
Measurement model passed to |
step_facet |
Step facet passed to |
maxit |
Maximum iterations passed to |
quad_points |
Quadrature points for |
residual_pca |
Residual PCA mode passed to |
sim_spec |
Optional output from |
dif_method |
Differential-functioning method passed to |
dif_min_obs |
Minimum observations per group cell for |
dif_p_adjust |
P-value adjustment method passed to |
dif_p_cut |
P-value cutoff for counting a target DIF detection. |
dif_abs_cut |
Optional absolute contrast cutoff used when counting a
target DIF detection. When omitted, the effective default is |
bias_max_iter |
Maximum iterations passed to |
bias_p_cut |
P-value cutoff for counting a target bias screen-positive result. |
bias_abs_t |
Absolute t cutoff for counting a target bias screen-positive result. |
seed |
Optional seed for reproducible replications. |
This function performs Monte Carlo design screening for two related tasks:
DIF detection via analyze_dff() and interaction-bias screening via
estimate_bias().
For each design condition (combination of n_person, n_rater,
n_criterion, raters_per_person), the function:
Generates synthetic data with simulate_mfrm_data()
Injects one known Group \times Criterion DIF effect
(dif_effect logits added to the focal group on the target criterion)
Injects one known Rater \times Criterion interaction-bias
effect (bias_effect logits)
Fits and diagnoses the MFRM
Runs analyze_dff() and estimate_bias()
Records whether the injected signals were detected or screen-positive
Detection criteria:
A DIF signal is counted as "detected" when the target contrast has
p < dif_p_cut and, when an absolute contrast cutoff is in
force, |\mathrm{Contrast}| \ge dif_abs_cut. For
dif_method = "refit", dif_abs_cut is interpreted on the logit scale.
For dif_method = "residual", the residual-contrast screening result is
used and the default is to rely on the significance test alone.
Bias results are different: estimate_bias() reports t and Prob. as
screening metrics rather than formal inferential quantities. Here, a bias
cell is counted as screen-positive only when those screening metrics are
available and satisfy
p < bias_p_cut and |t| \ge bias_abs_t.
Power is the proportion of replications in which the target signal
was correctly detected. For DIF this is a conventional power summary.
For bias, the primary summary is BiasScreenRate, a screening hit rate
rather than formal inferential power.
False-positive rate is the proportion of non-target cells that were
incorrectly flagged. For DIF this is interpreted in the usual testing
sense. For bias, BiasScreenFalsePositiveRate is a screening rate and
should not be read as a calibrated inferential alpha level.
Default effect sizes: dif_effect = 0.6 logits corresponds to a
moderate criterion-linked differential-functioning effect; bias_effect = -0.8
logits represents a substantial rater-criterion interaction. Adjust
these to match the smallest effect size of practical concern for your
application.
This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.
If you want to approximate a real study, choose the design grid and
simulation settings so that they reflect the empirical context of interest.
For example, you may set n_person, n_rater, n_criterion,
raters_per_person, and the latent-spread arguments to values motivated by
an existing assessment program, then study how operating characteristics
change as those design settings vary.
When sim_spec is supplied, the function uses it as the explicit
data-generating mechanism for the latent spreads, thresholds, and assignment
archetype, while still injecting the requested target DIF and bias effects
for each design condition.
An object of class mfrm_signal_detection with:
design_grid: evaluated design conditions
results: replicate-level detection results
rep_overview: run-level status and timing
settings: signal-analysis settings
ademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)
The simulation logic follows the general Monte Carlo / operating-characteristic
framework described by Morris, White, and Crowther (2019) and the
ADEMP-oriented planning/reporting guidance summarized for psychology by
Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a
many-facet screening helper specialized to DIF and interaction-bias use
cases; it is not a direct implementation of one published many-facet Rasch
simulation design.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
simulate_mfrm_data(), evaluate_mfrm_design(), analyze_dff(), analyze_dif(), estimate_bias()
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection(
n_person = 20,
n_rater = 3,
n_criterion = 3,
raters_per_person = 2,
reps = 1,
maxit = 10,
bias_max_iter = 1,
seed = 123
))
s_sig <- summary(sig_eval)
s_sig$detection_summary[, c("n_person", "DIFPower", "BiasScreenRate")]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.