View source: R/api-simulation.R
| simulate_mfrm_data | R Documentation |
Simulate long-format ordered many-facet data for design studies
simulate_mfrm_data(
n_person = 50,
n_rater = 4,
n_criterion = 4,
raters_per_person = n_rater,
design = NULL,
score_levels = 4,
theta_sd = 1,
rater_sd = 0.35,
criterion_sd = 0.25,
noise_sd = 0,
step_span = 1.4,
group_levels = NULL,
dif_effects = NULL,
interaction_effects = NULL,
seed = NULL,
model = c("RSM", "PCM", "GPCM"),
step_facet = "Criterion",
slope_facet = NULL,
thresholds = NULL,
slopes = NULL,
assignment = NULL,
sparse_controls = NULL,
sim_spec = NULL
)
n_person |
Number of persons/respondents. |
n_rater |
Number of rater facet levels. |
n_criterion |
Number of criterion/item facet levels. |
raters_per_person |
Number of raters assigned to each person. |
design |
Optional named design override supplied as a named list,
named vector, or one-row data frame. When |
score_levels |
Number of ordered score categories. |
theta_sd |
Standard deviation of simulated person measures. |
rater_sd |
Standard deviation of simulated rater severities. |
criterion_sd |
Standard deviation of simulated criterion difficulties. |
noise_sd |
Optional observation-level noise added to the linear predictor. |
step_span |
Spread of step thresholds on the logit scale. |
group_levels |
Optional character vector of group labels. When supplied,
a balanced |
dif_effects |
Optional data.frame describing true group-linked DIF
effects. Must include |
interaction_effects |
Optional data.frame describing true non-group
interaction effects. Must include at least one design column such as
|
seed |
Optional random seed. |
model |
Measurement model recorded in the simulation setup. The current
public generator supports |
step_facet |
Step facet used when |
slope_facet |
Slope facet used when |
thresholds |
Optional threshold specification. Use a numeric vector of
common thresholds; a named list such as |
slopes |
Optional slope specification used when |
assignment |
Assignment design. |
sparse_controls |
Optional named list used when
|
sim_spec |
Optional output from |
This function generates synthetic ordered many-facet data under RSM,
PCM, or the package's bounded GPCM branch.
The data-generating process is:
Draw person abilities: \theta_n \sim N(0, \texttt{theta\_sd}^2)
Draw rater severities: \delta_j \sim N(0, \texttt{rater\_sd}^2)
Draw criterion difficulties: \beta_i \sim N(0, \texttt{criterion\_sd}^2)
Generate evenly-spaced step thresholds spanning \pmstep_span/2
For each observation, compute the linear predictor
\eta = \theta_n - \delta_j - \beta_i + \epsilon where
\epsilon \sim N(0, \texttt{noise\_sd}^2) (optional)
Compute category probabilities under the recorded measurement model
(RSM, PCM, or bounded GPCM) and sample the response
Latent-value generation is explicit:
latent_distribution = "normal" draws centered normal person/rater/
criterion values using the supplied standard deviations
latent_distribution = "empirical" resamples centered support values
recorded in sim_spec$empirical_support
if sim_spec$population$active = TRUE, person measures are generated from
the stored latent-regression population model and template person
covariates rather than from theta_sd
When dif_effects is supplied, the specified logit shift is added to
\eta for the focal group on the target facet level, creating a
known DIF signal. Similarly, interaction_effects injects a known
bias into specific facet-level combinations.
The generator targets the common two-facet rating design (persons
\times raters \times criteria). raters_per_person
controls the incomplete-block structure: when less than n_rater,
each person is assigned a rotating subset of raters to keep coverage
balanced and reproducible.
Threshold handling is intentionally explicit:
if thresholds = NULL, common equally spaced thresholds are generated
from step_span
if thresholds is a numeric vector, it is used as one common threshold set
if thresholds is a named list, numeric matrix, or data frame, threshold
values may vary by StepFacet (currently Criterion or Rater)
For bounded GPCM, the generator now requires an explicit slope
contract in parallel with the threshold table. The current public branch
keeps slope_facet == step_facet, normalizes supplied slopes to the same
geometric-mean-one log-slope identification used by fit_mfrm(), and uses
the internal category_prob_gpcm() helper for response sampling. Broader
arbitrary-facet planning remains restricted until that slope-aware contract
is generalized beyond the current role-based design, population-forecasting,
diagnostic-screening, and signal-detection helpers.
Assignment handling is also explicit:
"crossed" uses the full person x rater x criterion design
"rotating" assigns a deterministic rotating subset of raters per person
"sparse_linked" assigns most persons to an incomplete rater subset and
assigns a configurable set of linking persons to a larger rater set
"resampled" reuses empirical person-level rater profiles stored in
sim_spec$assignment_profiles, optionally carrying over person-level
Group
"skeleton" reuses an observed person-by-rater-by-criterion response
skeleton stored in sim_spec$design_skeleton, optionally carrying over
Group and Weight
Sparse linked simulation is intended for planned-missing rating designs in
which connectivity is maintained through common linking persons. The
returned mfrm_sparse_design attribute summarizes design density, planned
missingness, rater coverage, and rater-pair common-person counts. These
summaries are design diagnostics, not model-fit statistics or universal
adequacy thresholds. This branch follows sparse rater-mediated assessment
design work by Wind, Jones, and Grajeda (2023,
doi:10.1177/01466216231182148), Wind and Jones (2018,
doi:10.1177/0013164417703733), and DeMars, Shapovalov, and Hathcoat
(2023).
For more controlled workflows, build a reusable simulation specification
first via build_mfrm_sim_spec() or derive one from an observed fit with
extract_mfrm_sim_spec(), then pass it through sim_spec.
Returned data include attributes:
mfrm_truth: simulated true parameters (for parameter-recovery checks)
mfrm_truth$signals: injected DIF and interaction signal tables
mfrm_truth$slope_table: simulated discrimination table for bounded
GPCM
mfrm_population_data: generated one-row-per-person background data when
the simulation specification stores an active latent-regression generator,
including model-matrix xlevel and contrast provenance for categorical
covariates
mfrm_simulation_spec: generation settings (for reproducibility)
mfrm_sparse_design: sparse-design diagnostics when
assignment = "sparse_linked", including design density, planned missing
rate, rater coverage, and rater-pair common-person counts
A long-format data.frame with core columns Study, Person,
two simulated non-person facet columns, and Score. By default those
facet columns are Rater and Criterion; when sim_spec records custom
public names, those names are used instead. If group labels are simulated
or reused from an observed response skeleton, a Group column is
included. If a weighted response skeleton is reused, a Weight column is
also included.
Higher theta values in mfrm_truth$person indicate higher person measures.
Higher values in mfrm_truth$facets$Rater indicate more severe raters.
Higher values in mfrm_truth$facets$Criterion indicate more difficult criteria.
mfrm_truth$signals$dif_effects and mfrm_truth$signals$interaction_effects
record any injected detection targets.
Generate one design with simulate_mfrm_data().
Fit with fit_mfrm() and diagnose with diagnose_mfrm().
For repeated design studies, use evaluate_mfrm_design().
evaluate_mfrm_design(), fit_mfrm(), diagnose_mfrm()
sim <- simulate_mfrm_data(
n_person = 40,
n_rater = 4,
n_criterion = 4,
raters_per_person = 2,
seed = 123
)
head(sim)
names(attr(sim, "mfrm_truth"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.