RRgen: Generate randomized response data

View source: R/RRgen.R

RRgenR Documentation

Generate randomized response data


The method RRgen generates data according to a specified RR model, e.g., "Warner". True states are either provided by a vector trueState or drawn randomly from a Bernoulli distribution. Useful for simulation and testing purposes, e.g., power analysis.


  complyRates = c(1, 1),
  sysBias = c(0, 0),
  groupRatio = 0.5,
  Kukrep = 1,
  trueState = NULL



sample size of generated data


true proportion in population (a vector for m-categorical "FR" or "custom" model)


specifes the RR model, one of: "Warner", "UQTknown", "UQTunknown", "Mangat", "Kuk", "FR", "Crosswise", "Triangular", "CDM", "CDMsym", "SLD", "mix.norm", "mix.exp", "custom". See vignette("RRreg") for details.


randomization probability (depending on model, see RRuni for details)


vector with two values giving the proportions of carriers and non-carriers who adhere to the instructions, respectively


probability of responding 'yes' (coded as 1) in case of non-compliance for carriers and non-carriers of the sensitive attribute, respectively. If sysBias=c(0,0), carriers and non-carriers systematically give the nonsensitive response 'no' (also known as self-protective(SP)-'no' responses). If sysBias=c(0,0.5), carriers always respond 'no' whereas non-carriers randomly select a response category. Note that sysBias = c(0.5,0.5) might be the best choice for Kuk and Crosswise. For the m-categorical "FR" or "custom" model, sysBias can be given as a probability vector for categories 0 to (m-1).


proportion of participants in group 1. Only required for two-group models, e.g., SLD and CDM


Number of repetitions of Kuk's procedure (how often red and black cards are drawn)


optional vector containing true states of participants (i.e., 1 for carriers and 0 for noncarriers of sensitive attribute; for FR: values from 0,1,...,M-1 (M = number of response categories) which will be randomized according to the defined procedure (if specified, n and pi.true are ignored)


If trueState is specified, the randomized response procedure will be simulated for this vector, otherwise a random vector of length n with true proportion pi.true is drawn. Respondents answer biases can be simulated by adjusting the compliance rates: if complyRates is set to c(1,1), all respondents adhere to the randomization procedure. If one or both rates are smaller than 1, sysBias determines whether noncompliant respondents systematically choose the nonsensitive category or whether they answer randomly.

SLD - to generate data according to the stochastic lie detector with the proportion t of honest carriers, parameters are set to complyRates=c(t,1) and sysBias=c(0,0)

CDM - to generate data according to the cheating detection model with the proportion gamma of cheaters, parameters are set to complyRates=c(1-gamma,1-gamma) and sysBias=c(0,0)


data.frame including the variables true and response (and for SLD and CDM a third variable group)

See Also

see vignette('RRreg') for a detailed description of the models and RRlog, RRlin and RRcor for the multivariate analysis of RR data


# Generate responses of 1000 people according to Warner's model,
# every participant complies to the RR procedure
genData <- RRgen(n=1000, pi.true=.3, model="Warner", p=.7)

# use Kuk's model with two decks of cards, 
# p gives the proportions of red cards for carriers/noncarriers
genData <- RRgen(n=1000, pi.true=.4, model="Kuk", p=c(.4,.7))

# Stochastic Lie Detector (SLD):
# Only 80% of carriers answer according to the RR procedure
genData <- RRgen(n=1000, pi.true=.2, model="SLD", p=c(.2,.8),

danheck/RRreg documentation built on Nov. 13, 2022, 11:41 p.m.