gen_data: Simulate Truth Rating Data
In truthiness: Illusory Truth Longitudinal Study

Description Usage Arguments Details Value See Also Examples

Simulate Truth Rating Data

gen_data(
  nsubj,
  phase_eff = rep(0, 4),
  thresh = alpha_6_to_7(truthiness::clmm_maximal$alpha),
  subj_rfx = ordinal::VarCorr(truthiness::clmm_maximal)$subj_id,
  item_rfx = ordinal::VarCorr(truthiness::clmm_maximal)$item_id,
  dropout = c(0.05, 0.1, 0.1)
)

`nsubj`	Number of subjects. Because of counterbalancing, must be a multiple of 8.
`phase_eff`	A four-element vector giving the size of the illusory truth effect at each of the four phases (on the log odds scale). Use `rep(0, 4)` for testing Type I error rate. A value of .14 gives an effect of approximately 1/10 of a scale point.
`thresh`	Cut-points (thresholds) for the seven point scale (must be a six-element vector).
`subj_rfx`	A 4x4 covariance matrix with by-subject variance components for the intercept, main effect of repetition, main effect of interval, and repetition-by-interval interaction. Only the variances (elements on the diagonal) are used in the simulation (see Details).
`item_rfx`	A 4x4 covariance matrix with by-statement variance components for the intercept, main effect of repetition, main effect of interval, and repetition-by-interval interaction. Only the variances (elements on the diagonal) are used in the simulation (see Details).
`dropout`	A vector encoding assumptions about the proportion of subjects dropping out of the study over the four testing intervals (immediate, 1 day, 1 week, 1 month). The first element represents the proportion of subjects who completed the first phase (immediate) but who drop out before the next interval one day later. The second element represents the proportion of the remaining participants dropping out after 1 day and before 1 week. The third and final element represents the proportion of remaining participants dropping out after 1 week and before 1 month. For example, the default values of `c(.05, .1, .1)` encode dropout rates of 5%, 10%, and 10%.

By default, the thresholds and parameter estimates for variance components used in the simulation are from the cumulative link mixed model fit to the Nadarevic and Erdfelder data. Only the variances from the by-subject and by-item covariance matrices are used. Unlike Nadarevic and Erdfelder, who only had two testing intervals, the simulated study assumes four intervals, coded by three predictors for the main effect and three for the interaction with repetition. The code below depicts how the four-element variance vector from the original study is translated into the eight variances needed for the simulated data.

newvar_subj <- rep(diag(subj_rfx), c(1, 1, 3, 3))

newvar_item <- rep(diag(item_rfx), c(1, 1, 3, 3))

The simulated data includes ratings for 128 stimulus items for each subject. Half of the statements are repeated (old) and half are new. A quarter of the items (32) are tested at each phase.

It is assumed that the key effect present in the data is the interaction term, which is designed to represent an illusory-truth effect that first appears at the second testing interval (1 day) and remains over the subsequent two intervals without changing size. All other fixed effects in the model (main effect of R and three effects encoding the main effect of interval) are driven by the interaction term.

A data frame, with nsubj * 128 rows and 11 variables, where:

subj_id: Unique subject identifier.
list_id: Which set of statements the subject received.
stim_id: Unique stimulus (statement) identifier.
repetition: Whether the statement was old or new.
interval: Testing interval (immediate, 1 day, 1 week, 1 month).
eta: The simulated response tendency, on the log odds scale.
trating: The simulated rating value.
R: Deviation-coded predictor for repetition (old = 1/2, new = -1/2).
I1: Deviation-coded predictor for interval comparing baseline (immediate) to 1 day.
I2: Deviation-coded predictor for interval comparing baseline (immediate) to 1 week.
I3: Deviation-coded predictor for interval comparing baseline (immediate) to 1 month.

clmm_maximal, NE_exp1

# demonstrate how to convert from four variances to eight
four_var <- diag(ordinal::VarCorr(clmm_maximal)$subj_id)
four_var
rep(four_var, c(1, 1, 3, 3))

# basic usage
dat <- gen_data(256)

# demonstrate deviation coding
dat %>% dplyr::distinct(repetition, interval, R, I1, I2, I3)

# demonstrate dropouts
dat %>% dplyr::distinct(subj_id, interval) %>% dplyr::count(interval)