ictreg: Item Count Technique In SensitiveQuestions/list: Statistical Methods for the Item Count Technique and List Experiment

Description

Function to conduct multivariate regression analyses of survey data with the item count technique, also known as the list experiment and the unmatched count technique.

Usage

 ```1 2 3 4 5 6 7 8``` ```ictreg(formula, data = parent.frame(), treat = "treat", J, method = "ml", weights, h = NULL, group = NULL, matrixMethod = "efficient", robust = FALSE, error = "none", overdispersed = FALSE, constrained = TRUE, floor = FALSE, ceiling = FALSE, ceiling.fit = "glm", floor.fit = "glm", ceiling.formula = ~1, floor.formula = ~1, fit.start = "lm", fit.sensitive = "glm", fit.nonsensitive = "nls", multi.condition = "none", maxIter = 5000, verbose = FALSE, ...) ```

Arguments

 `formula` An object of class "formula": a symbolic description of the model to be fitted. `data` A data frame containing the variables in the model `treat` Name of treatment indicator as a string. For single sensitive item models, this refers to a binary indicator, and for multiple sensitive item models it refers to a multi-valued variable with zero representing the control condition. This can be an integer (with 0 for the control group) or a factor (with "control" for the control group). `J` Number of non-sensitive (control) survey items. `method` Method for regression, either `ml` for the Maximum Likelihood (ML) estimation with the Expectation-Maximization algorithm; `lm` for linear model estimation; or `nls` for the Non-linear Least Squares (NLS) estimation with the two-step procedure. `weights` Name of the weights variable as a string, if weighted regression is desired. Not implemented for the ceiling/floor models, multiple sensitive item design, or for the modified design. `h` Auxiliary data functionality. Optional named numeric vector with length equal to number of groups. Names correspond to group labels and values correspond to auxiliary moments. `group` Auxiliary data functionality. Optional character vector of group labels with length equal to number of observations. `matrixMethod` Auxiliary data functionality. Procedure for estimating optimal weighting matrix for generalized method of moments. One of "efficient" for two-step feasible and "cue" for continuously updating. Default is "efficient". Only relevant if `h` and `group` are specified. `robust` Robust NLS and ML models that ensure that the estimated proportion of the sensitive trait is close to difference-in-means estimate. `error` ML models that model response error processes proposed in Blair, Chou, and Imai (2018). Select either `none` (standard ML), the default; `topcode`, which models an error process in which a random subset of respondents chooses the maximal (ceiling) response value, regardless of their truthful response; and `uniform`, which models an error process in which a subset of respondents chooses their responses at random. `overdispersed` Indicator for the presence of overdispersion. If `TRUE`, the beta-binomial model is used in the EM algorithm, if `FALSE` the binomial model is used. Not relevant for the `NLS` or `lm` methods. `constrained` A logical value indicating whether the control group parameters are constrained to be equal. Not relevant for the `NLS` or `lm` methods `floor` A logical value indicating whether the floor liar model should be used to adjust for the possible presence of respondents dishonestly reporting a negative preference for the sensitive item among those who hold negative views of all the non-sensitive items. `ceiling` A logical value indicating whether the ceiling liar model should be used to adjust for the possible presence of respondents dishonestly reporting a negative preference for the sensitive item among those who hold affirmative views of all the non-sensitive items. `ceiling.fit` Fit method for the M step in the EM algorithm used to fit the ceiling liar model. `glm` uses standard logistic regression, while `bayesglm` uses logistic regression with a weakly informative prior over the parameters. `floor.fit` Fit method for the M step in the EM algorithm used to fit the floor liar model. `glm` uses standard logistic regression, while `bayesglm` uses logistic regression with a weakly informative prior over the parameters. `ceiling.formula` Covariates to include in ceiling liar model. These must be a subset of the covariates used in `formula`. `floor.formula` Covariates to include in floor liar model. These must be a subset of the covariates used in `formula`. `fit.start` Fit method for starting values for standard design `ml` model. The options are `lm`, `glm`, and `nls`, which use OLS, logistic regression, and non-linear least squares to generate starting values, respectively. The default is `nls`. `fit.sensitive` Fit method for the sensitive item fit for maximum likelihood models. `glm` uses standard logistic regression, while `bayesglm` uses logistic regression with a weakly informative prior over the parameters. `fit.nonsensitive` Fit method for the non-sensitive item fit for the `nls` method and the starting values for the `ml` method for the `modified` design. Options are `glm` and `nls`, and the default is `nls`. `multi.condition` For the multiple sensitive item design, covariates representing the estimated count of affirmative responses for each respondent can be included directly as a level variable by choosing `level`, or as indicator variables for each value but one by choosing `indicators`. The default is `none`. `maxIter` Maximum number of iterations for the Expectation-Maximization algorithm of the ML estimation. The default is 5000. `verbose` a logical value indicating whether model diagnostics are printed out during fitting. `...` further arguments to be passed to NLS regression commands.

Details

This function allows the user to perform regression analysis on data from the item count technique, also known as the list experiment and the unmatched count technique.

Three list experiment designs are accepted by this function: the standard design; the multiple sensitive item standard design; and the modified design proposed by Corstange (2009).

For the standard design, three methods are implemented in this function: the linear model; the Maximum Likelihood (ML) estimation for the Expectation-Maximization (EM) algorithm; the nonlinear least squares (NLS) estimation with the two-step procedure both proposed in Imai (2010); and the Maximum Likelihood (ML) estimator in the presence of two types of dishonest responses, "ceiling" and "floor" liars. The ceiling model, floor model, or both, as described in Blair and Imai (2010) can be activated by using the `ceiling` and `floor` options. The constrained and unconstrained ML models presented in Imai (2010) are available through the `constrained` option, and the user can specify if overdispersion is present in the data for the no liars models using the `overdispersed` option to control whether a beta-binomial or binomial model is used in the EM algorithm to model the item counts.

The modified design and the multiple sensitive item design are automatically detected by the function, and only the binomial model without overdispersion is available.

Value

`ictreg` returns an object of class "ictreg". The function `summary` is used to obtain a table of the results. The object `ictreg` is a list that contains the following components. Some of these elements are not available depending on which method is used (`lm`, `nls` or `ml`), which design is used (`standard`, `modified`), whether multiple sensitive items are include (`multi`), and whether the constrained model is used (```constrained = TRUE```).

 `par.treat` point estimate for effect of covariate on item count fitted on treatment group `se.treat` standard error for estimate of effect of covariate on item count fitted on treatment group `par.control` point estimate for effect of covariate on item count fitted on control group `se.control` standard error for estimate of effect of covariate on item count fitted on control group `coef.names` variable names as defined in the data frame `design` call indicating whether the `standard` design as proposed in Imai (2010) or thee `modified` design as proposed in Corstange (2009) is used `method` call of the method used `overdispersed` call indicating whether data is overdispersed `constrained` call indicating whether the constrained model is used `boundary` call indicating whether the floor/ceiling boundary models are used `multi` indicator for whether multiple sensitive items were included in the data frame `call` the matched call `data` the `data` argument `x` the design matrix `y` the response vector `treat` the vector indicating treatment status `J` Number of non-sensitive (control) survey items set by the user or detected. `treat.labels` a vector of the names used by the `treat` vector for the sensitive item or items. This is the names from the `treat` indicator if it is a factor, or the number of the item if it is numeric. `control.label` a vector of the names used by the `treat` vector for the control items. This is the names from the `treat` indicator if it is a factor, or the number of the item if it is numeric.

For the maximum likelihood models, an additional output object is included:

 `pred.post` posterior predicted probability of answering "yes" to the sensitive item. The weights from the E-M algorithm.

For the floor/ceiling models, several additional output objects are included:

 `ceiling` call indicating whether the assumption of no ceiling liars is relaxed, and ceiling parameters are estimated `par.ceiling` point estimate for effect of covariate on whether respondents who answered affirmatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item `se.ceiling` standard error for estimate for effect of covariate on whether respondents who answered affirmatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item `floor` call indicating whether the assumption of no floor liars is relaxed, and floor parameters are estimated `par.ceiling` point estimate for effect of covariate on whether respondents who answered negatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item `se.ceiling` standard error for estimate for effect of covariate on whether respondents who answered negatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item `coef.names.ceiling` variable names from the ceiling liar model fit, if applicable `coef.names.floor` variable names from the floor liar model fit, if applicable

For the multiple sensitive item design, the `par.treat` and `se.treat` vectors are returned as lists of vectors, one for each sensitive item.

For the unconstrained model, the `par.control` and `se.control` output is replaced by:

 `par.control.phi0` point estimate for effect of covariate on item count fitted on treatment group `se.control.phi0` standard error for estimate of effect of covariate on item count fitted on treatment group `par.control.phi1` point estimate for effect of covariate on item count fitted on treatment group `se.control.phi1` standard error for estimate of effect of covariate on item count fitted on treatment group

Depending upon the estimator requested by the user, model fit statistics are also included:

 `llik` the log likelihood of the model, if `ml` is used `resid.se` the residual standard error, if `nls` or `lm` are used. This will be a scalar if the standard design was used, and a vector if the multiple sensitive item design was used `resid.df` the residual degrees of freedom, if `nls` or `lm` are used. This will be a scalar if the standard design was used, and a vector if the multiple sensitive item design was used

When using the auxiliary data functionality, the following objects are included:

 `aux` logical value indicating whether estimation incorporates auxiliary moments `nh` integer count of the number of auxiliary moments `wm` procedure used to estimate the optimal weight matrix `J.stat` numeric value of the Sargan Hansen overidentifying restriction test statistic `overid.p` corresponding p-value for the Sargan Hansen test

Author(s)

Graeme Blair, UCLA, [email protected] and Kosuke Imai, Princeton University, [email protected]

References

Blair, Graeme and Kosuke Imai. (2012) “Statistical Analysis of List Experiments." Political Analysis. Forthcoming. available at http://imai.princeton.edu/research/listP.html

Imai, Kosuke. (2011) “Multivariate Regression Analysis for the Item Count Technique.” Journal of the American Statistical Association, Vol. 106, No. 494 (June), pp. 407-416. available at http://imai.princeton.edu/research/list.html

`predict.ictreg` for fitted values
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120``` ```data(race) set.seed(1) # Calculate list experiment difference in means diff.in.means.results <- ictreg(y ~ 1, data = race, treat = "treat", J=3, method = "lm") summary(diff.in.means.results) # Fit linear regression # Replicates Table 1 Columns 1-2 Imai (2011); note that age is divided by 10 lm.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "lm") summary(lm.results) # Fit two-step non-linear least squares regression # Replicates Table 1 Columns 3-4 Imai (2011); note that age is divided by 10 nls.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "nls") summary(nls.results) ## Not run: # Fit EM algorithm ML model with constraint # Replicates Table 1 Columns 5-6, Imai (2011); note that age is divided by 10 ml.constrained.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "ml", overdispersed = FALSE, constrained = TRUE) summary(ml.constrained.results) # Fit EM algorithm ML model with no constraint # Replicates Table 1 Columns 7-10, Imai (2011); note that age is divided by 10 ml.unconstrained.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "ml", overdispersed = FALSE, constrained = FALSE) summary(ml.unconstrained.results) # Fit EM algorithm ML model for multiple sensitive items # Replicates Table 3 in Blair and Imai (2010) multi.results <- ictreg(y ~ male + college + age + south + south:age, treat = "treat", J = 3, data = multi, method = "ml", multi.condition = "level") summary(multi.results) # Fit standard design ML model # Replicates Table 7 Columns 1-2 in Blair and Imai (2010) noboundary.results <- ictreg(y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "ml", overdispersed = FALSE) summary(noboundary.results) # Fit standard design ML model with ceiling effects alone # Replicates Table 7 Columns 3-4 in Blair and Imai (2010) ceiling.results <- ictreg(y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "ml", fit.start = "nls", ceiling = TRUE, ceiling.fit = "bayesglm", ceiling.formula = ~ age + college + male + south) summary(ceiling.results) # Fit standard design ML model with floor effects alone # Replicates Table 7 Columns 5-6 in Blair and Imai (2010) floor.results <- ictreg(y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "ml", fit.start = "glm", floor = TRUE, floor.fit = "bayesglm", floor.formula = ~ age + college + male + south) summary(floor.results) # Fit standard design ML model with floor and ceiling effects # Replicates Table 7 Columns 7-8 in Blair and Imai (2010) both.results <- ictreg(y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "ml", floor = TRUE, ceiling = TRUE, floor.fit = "bayesglm", ceiling.fit = "bayesglm", floor.formula = ~ age + college + male + south, ceiling.formula = ~ age + college + male + south) summary(both.results) # Response error models (Blair, Imai, and Chou 2018) top.coded.error <- ictreg( y ~ age + college + male + south, treat = "treat", J = 3, data = race, method = "ml", error = "topcoded") uniform.error <- ictreg( y ~ age + college + male + south, treat = "treat", J = 3, data = race, method = "ml", error = "topcoded") # Robust models, which constrain sensitive item proportion # to difference-in-means estimate robust.ml <- ictreg( y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "ml", robust = TRUE) robust.nls <- ictreg( y ~ age + college + male + south, treat = "treat", J = 3, data = affirm, method = "nls", robust = TRUE) ## End(Not run) ```