RRlog: Logistic randomized response regression

View source: R/RRlog.R

RRlogR Documentation

Logistic randomized response regression

Description

A dichotomous variable, measured once or more per person by a randomized response method, serves as dependent variable using one or more continuous and/or categorical predictors.

Usage

RRlog(
  formula,
  data,
  model,
  p,
  group,
  n.response = 1,
  LR.test = TRUE,
  fit.n = 3,
  EM.max = 1000,
  optim.max = 500,
  ...
)

Arguments

formula

specifying the regression model, see formula

data

data.frame, in which variables can be found (optional)

model

Available RR models: "Warner", "UQTknown", "UQTunknown", "Mangat", "Kuk", "FR", "Crosswise", "Triangular", "CDM", "CDMsym", "SLD", "custom". See vignette("RRreg") for details.

p

randomization probability/probabilities (depending on model, see RRuni for details)

group

vector specifying group membership. Can be omitted for single-group RR designs (e.g., Warner). For two-group RR designs (e.g., CDM or SLD), use 1 and 2 to indicate the group membership, matching the respective randomization probabilities p[1], p[2]. If an RR design and a direct question (DQ) were both used in the study, the group indices are set to 0 (DQ) and 1 (RR; 1 or 2 for two-group RR designs). This can be used to test, whether the RR design leads to a different prevalence estimate by including a dummy variable for the question format (RR vs. DQ) as predictor. If the corresponding regression coefficient is significant, the prevalence estimates differ between RR and DQ. Similarly, interaction hypotheses can be tested (e.g., the correlation between a sensitive attribute and a predictor is only found using the RR but not the DQ design). Hypotheses like this can be tested by including the interaction of the DQ-RR-dummy variable and the predictor in formula (e.g., RR ~ dummy*predictor).

n.response

number of responses per participant, e.g., if a participant responds to 5 RR questions with the same randomization probability p (either a single number if all participants give the same number of responses or a vector)

LR.test

test regression coefficients by a likelihood ratio test, i.e., fitting the model repeatedly while excluding one parameter at a time (each nested model is fitted only once, which can result in local maxima). The likelihood-ratio test statistic G^2(df=1) is reported in the table of coefficiencts as deltaG2.

fit.n

Number of fitting replications using random starting values to avoid local maxima

EM.max

maximum number of iterations of the EM algorithm. If EM.max=0, the EM algorithm is skipped.

optim.max

Maximum number of iterations within each run of optim

...

ignored

Details

The logistic regression model is fitted first by an EM algorithm, in which the dependend RR variable is treated as a misclassified binary variable (Magder & Hughes, 1997). The results are used as starting values for a Newton-Raphson based optimization by optim.

Value

Returns an object RRlog which can be analysed by the generic method summary. In the table of coefficients, the column Wald refers to the Chi^2 test statistic which is computed as Chi^2 = z^2 = Estimate^2/StdErr^2. If LR.test = TRUE, the test statistic deltaG2 is the likelihood-ratio-test statistic, which is computed by fitting a nested logistic model without the corresponding predictor.

Author(s)

Daniel W. Heck

References

van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060-6069.

See Also

anova.RRlog for model comparisons, plot.RRlog for plotting predicted regression curves, and vignette('RRreg') or https://www.dwheck.de/vignettes/RRreg.html for a detailed description of the RR models and the appropriate definition of p

Examples

# generate data set without biases
dat <- RRgen(1000, pi = .3, "Warner", p = .9)
dat$covariate <- rnorm(1000)
dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1)
# analyse
ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1)
summary(ana)
# check with true, latent states:
glm(true ~ covariate, dat, family = binomial(link = "logit"))

danheck/RRreg documentation built on Dec. 3, 2022, 7:50 p.m.