ictreg: Item Count Technique

View source: R/ictreg.R

ictregR Documentation

Item Count Technique

Description

Function to conduct multivariate regression analyses of survey data with the item count technique, also known as the list experiment and the unmatched count technique.

Usage

ictreg(
  formula,
  data = parent.frame(),
  treat = "treat",
  J,
  method = "ml",
  weights,
  h = NULL,
  group = NULL,
  matrixMethod = "efficient",
  robust = FALSE,
  error = "none",
  overdispersed = FALSE,
  constrained = TRUE,
  floor = FALSE,
  ceiling = FALSE,
  ceiling.fit = "glm",
  floor.fit = "glm",
  ceiling.formula = ~1,
  floor.formula = ~1,
  fit.start = "lm",
  fit.sensitive = "glm",
  fit.nonsensitive = "nls",
  multi.condition = "none",
  maxIter = 5000,
  verbose = FALSE,
  ...
)

Arguments

formula

An object of class "formula": a symbolic description of the model to be fitted.

data

A data frame containing the variables in the model

treat

Name of treatment indicator as a string. For single sensitive item models, this refers to a binary indicator, and for multiple sensitive item models it refers to a multi-valued variable with zero representing the control condition. This can be an integer (with 0 for the control group) or a factor (with "control" for the control group).

J

Number of non-sensitive (control) survey items.

method

Method for regression, either ml for the Maximum Likelihood (ML) estimation with the Expectation-Maximization algorithm; lm for linear model estimation; or nls for the Non-linear Least Squares (NLS) estimation with the two-step procedure.

weights

Name of the weights variable as a string, if weighted regression is desired. Not implemented for the ceiling/floor models, multiple sensitive item design, or for the modified design.

h

Auxiliary data functionality. Optional named numeric vector with length equal to number of groups. Names correspond to group labels and values correspond to auxiliary moments.

group

Auxiliary data functionality. Optional character vector of group labels with length equal to number of observations.

matrixMethod

Auxiliary data functionality. Procedure for estimating optimal weighting matrix for generalized method of moments. One of "efficient" for two-step feasible and "cue" for continuously updating. Default is "efficient". Only relevant if h and group are specified.

robust

Robust NLS and ML models that ensure that the estimated proportion of the sensitive trait is close to difference-in-means estimate.

error

ML models that model response error processes proposed in Blair, Chou, and Imai (2018). Select either none (standard ML), the default; topcode, which models an error process in which a random subset of respondents chooses the maximal (ceiling) response value, regardless of their truthful response; and uniform, which models an error process in which a subset of respondents chooses their responses at random.

overdispersed

Indicator for the presence of overdispersion. If TRUE, the beta-binomial model is used in the EM algorithm, if FALSE the binomial model is used. Not relevant for the NLS or lm methods.

constrained

A logical value indicating whether the control group parameters are constrained to be equal. Not relevant for the NLS or lm methods

floor

A logical value indicating whether the floor liar model should be used to adjust for the possible presence of respondents dishonestly reporting a negative preference for the sensitive item among those who hold negative views of all the non-sensitive items.

ceiling

A logical value indicating whether the ceiling liar model should be used to adjust for the possible presence of respondents dishonestly reporting a negative preference for the sensitive item among those who hold affirmative views of all the non-sensitive items.

ceiling.fit

Fit method for the M step in the EM algorithm used to fit the ceiling liar model. glm uses standard logistic regression, while bayesglm uses logistic regression with a weakly informative prior over the parameters.

floor.fit

Fit method for the M step in the EM algorithm used to fit the floor liar model. glm uses standard logistic regression, while bayesglm uses logistic regression with a weakly informative prior over the parameters.

ceiling.formula

Covariates to include in ceiling liar model. These must be a subset of the covariates used in formula.

floor.formula

Covariates to include in floor liar model. These must be a subset of the covariates used in formula.

fit.start

Fit method for starting values for standard design ml model. The options are lm, glm, and nls, which use OLS, logistic regression, and non-linear least squares to generate starting values, respectively. The default is nls.

fit.sensitive

Fit method for the sensitive item fit for maximum likelihood models. glm uses standard logistic regression, while bayesglm uses logistic regression with a weakly informative prior over the parameters.

fit.nonsensitive

Fit method for the non-sensitive item fit for the nls method and the starting values for the ml method for the modified design. Options are glm and nls, and the default is nls.

multi.condition

For the multiple sensitive item design, covariates representing the estimated count of affirmative responses for each respondent can be included directly as a level variable by choosing level, or as indicator variables for each value but one by choosing indicators. The default is none.

maxIter

Maximum number of iterations for the Expectation-Maximization algorithm of the ML estimation. The default is 5000.

verbose

a logical value indicating whether model diagnostics are printed out during fitting.

...

further arguments to be passed to NLS regression commands.

Details

This function allows the user to perform regression analysis on data from the item count technique, also known as the list experiment and the unmatched count technique.

Three list experiment designs are accepted by this function: the standard design; the multiple sensitive item standard design; and the modified design proposed by Corstange (2009).

For the standard design, three methods are implemented in this function: the linear model; the Maximum Likelihood (ML) estimation for the Expectation-Maximization (EM) algorithm; the nonlinear least squares (NLS) estimation with the two-step procedure both proposed in Imai (2010); and the Maximum Likelihood (ML) estimator in the presence of two types of dishonest responses, "ceiling" and "floor" liars. The ceiling model, floor model, or both, as described in Blair and Imai (2010) can be activated by using the ceiling and floor options. The constrained and unconstrained ML models presented in Imai (2010) are available through the constrained option, and the user can specify if overdispersion is present in the data for the no liars models using the overdispersed option to control whether a beta-binomial or binomial model is used in the EM algorithm to model the item counts.

The modified design and the multiple sensitive item design are automatically detected by the function, and only the binomial model without overdispersion is available.

Value

ictreg returns an object of class "ictreg". The function summary is used to obtain a table of the results. The object ictreg is a list that contains the following components. Some of these elements are not available depending on which method is used (lm, nls or ml), which design is used (standard, modified), whether multiple sensitive items are include (multi), and whether the constrained model is used (constrained = TRUE).

par.treat

point estimate for effect of covariate on item count fitted on treatment group

se.treat

standard error for estimate of effect of covariate on item count fitted on treatment group

par.control

point estimate for effect of covariate on item count fitted on control group

se.control

standard error for estimate of effect of covariate on item count fitted on control group

coef.names

variable names as defined in the data frame

design

call indicating whether the standard design as proposed in Imai (2010) or thee modified design as proposed in Corstange (2009) is used

method

call of the method used

overdispersed

call indicating whether data is overdispersed

constrained

call indicating whether the constrained model is used

boundary

call indicating whether the floor/ceiling boundary models are used

multi

indicator for whether multiple sensitive items were included in the data frame

call

the matched call

data

the data argument

x

the design matrix

y

the response vector

treat

the vector indicating treatment status

J

Number of non-sensitive (control) survey items set by the user or detected.

treat.labels

a vector of the names used by the treat vector for the sensitive item or items. This is the names from the treat indicator if it is a factor, or the number of the item if it is numeric.

control.label

a vector of the names used by the treat vector for the control items. This is the names from the treat indicator if it is a factor, or the number of the item if it is numeric.

For the maximum likelihood models, an additional output object is included:

pred.post

posterior predicted probability of answering "yes" to the sensitive item. The weights from the E-M algorithm.

For the floor/ceiling models, several additional output objects are included:

ceiling

call indicating whether the assumption of no ceiling liars is relaxed, and ceiling parameters are estimated

par.ceiling

point estimate for effect of covariate on whether respondents who answered affirmatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item

se.ceiling

standard error for estimate for effect of covariate on whether respondents who answered affirmatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item

floor

call indicating whether the assumption of no floor liars is relaxed, and floor parameters are estimated

par.ceiling

point estimate for effect of covariate on whether respondents who answered negatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item

se.ceiling

standard error for estimate for effect of covariate on whether respondents who answered negatively to all non-sensitive items and hold a true affirmative opinion toward the sensitive item lied and reported a negative response to the sensitive item

coef.names.ceiling

variable names from the ceiling liar model fit, if applicable

coef.names.floor

variable names from the floor liar model fit, if applicable

For the multiple sensitive item design, the par.treat and se.treat vectors are returned as lists of vectors, one for each sensitive item.

For the unconstrained model, the par.control and se.control output is replaced by:

par.control.phi0

point estimate for effect of covariate on item count fitted on treatment group

se.control.phi0

standard error for estimate of effect of covariate on item count fitted on treatment group

par.control.phi1

point estimate for effect of covariate on item count fitted on treatment group

se.control.phi1

standard error for estimate of effect of covariate on item count fitted on treatment group

Depending upon the estimator requested by the user, model fit statistics are also included:

llik

the log likelihood of the model, if ml is used

resid.se

the residual standard error, if nls or lm are used. This will be a scalar if the standard design was used, and a vector if the multiple sensitive item design was used

resid.df

the residual degrees of freedom, if nls or lm are used. This will be a scalar if the standard design was used, and a vector if the multiple sensitive item design was used

When using the auxiliary data functionality, the following objects are included:

aux

logical value indicating whether estimation incorporates auxiliary moments

nh

integer count of the number of auxiliary moments

wm

procedure used to estimate the optimal weight matrix

J.stat

numeric value of the Sargan Hansen overidentifying restriction test statistic

overid.p

corresponding p-value for the Sargan Hansen test

Author(s)

Graeme Blair, UCLA, graeme.blair@ucla.edu and Kosuke Imai, Princeton University, kimai@princeton.edu

References

Blair, Graeme and Kosuke Imai. (2012) “Statistical Analysis of List Experiments." Political Analysis. Forthcoming. available at http://imai.princeton.edu/research/listP.html

Imai, Kosuke. (2011) “Multivariate Regression Analysis for the Item Count Technique.” Journal of the American Statistical Association, Vol. 106, No. 494 (June), pp. 407-416. available at http://imai.princeton.edu/research/list.html

See Also

predict.ictreg for fitted values

Examples



data(race)

set.seed(1)

# Calculate list experiment difference in means

diff.in.means.results <- ictreg(y ~ 1, data = race, 
	       	      treat = "treat", J=3, method = "lm")

summary(diff.in.means.results)

# Fit linear regression
# Replicates Table 1 Columns 1-2 Imai (2011); note that age is divided by 10

lm.results <- ictreg(y ~ south + age + male + college, data = race, 
	       	      treat = "treat", J=3, method = "lm")

summary(lm.results)

# Fit two-step non-linear least squares regression
# Replicates Table 1 Columns 3-4 Imai (2011); note that age is divided by 10

nls.results <- ictreg(y ~ south + age + male + college, data = race, 
	       	      treat = "treat", J=3, method = "nls")

summary(nls.results)

## Not run: 

# Fit EM algorithm ML model with constraint
# Replicates Table 1 Columns 5-6, Imai (2011); note that age is divided by 10

ml.constrained.results <- ictreg(y ~ south + age + male + college, data = race, 
		       	  	 treat = "treat", J=3, method = "ml", 
				 overdispersed = FALSE, constrained = TRUE)

summary(ml.constrained.results)

# Fit EM algorithm ML model with no constraint
# Replicates Table 1 Columns 7-10, Imai (2011); note that age is divided by 10

ml.unconstrained.results <- ictreg(y ~ south + age + male + college, data = race, 
			    	   treat = "treat", J=3, method = "ml", 
				   overdispersed = FALSE, constrained = FALSE)

summary(ml.unconstrained.results)

# Fit EM algorithm ML model for multiple sensitive items
# Replicates Table 3 in Blair and Imai (2010)

multi.results <- ictreg(y ~ male + college + age + south + south:age, treat = "treat", 
	      	 	J = 3, data = multi, method = "ml", 
			multi.condition = "level")

summary(multi.results)

# Fit standard design ML model
# Replicates Table 7 Columns 1-2 in Blair and Imai (2010)

noboundary.results <- ictreg(y ~ age + college + male + south, treat = "treat",
		      	     J = 3, data = affirm, method = "ml", 
			     overdispersed = FALSE)

summary(noboundary.results)

# Fit standard design ML model with ceiling effects alone
# Replicates Table 7 Columns 3-4 in Blair and Imai (2010)

ceiling.results <- ictreg(y ~ age + college + male + south, treat = "treat", 
		   	  J = 3, data = affirm, method = "ml", fit.start = "nls",
			  ceiling = TRUE, ceiling.fit = "bayesglm",
			  ceiling.formula = ~ age + college + male + south)

summary(ceiling.results)

# Fit standard design ML model with floor effects alone
# Replicates Table 7 Columns 5-6 in Blair and Imai (2010)

floor.results <- ictreg(y ~ age + college + male + south, treat = "treat", 
	      	 	J = 3, data = affirm, method = "ml", fit.start = "glm", 
			floor = TRUE, floor.fit = "bayesglm",
			floor.formula = ~ age + college + male + south)

summary(floor.results)

# Fit standard design ML model with floor and ceiling effects
# Replicates Table 7 Columns 7-8 in Blair and Imai (2010)

both.results <- ictreg(y ~ age + college + male + south, treat = "treat", 
	     	       J = 3, data = affirm, method = "ml", 
		       floor = TRUE, ceiling = TRUE, 
		       floor.fit = "bayesglm", ceiling.fit = "bayesglm",
		       floor.formula = ~ age + college + male + south,
		       ceiling.formula = ~ age + college + male + south)

summary(both.results)

# Response error models (Blair, Imai, and Chou 2018)

top.coded.error <- ictreg(
   y ~ age + college + male + south, treat = "treat",
   J = 3, data = race, method = "ml", error = "topcoded")
   
uniform.error <- ictreg(
   y ~ age + college + male + south, treat = "treat",
   J = 3, data = race, method = "ml", error = "topcoded")
   
# Robust models, which constrain sensitive item proportion
#   to difference-in-means estimate

robust.ml <- ictreg(
   y ~ age + college + male + south, treat = "treat",
   J = 3, data = affirm, method = "ml", robust = TRUE)

robust.nls <- ictreg(
   y ~ age + college + male + south, treat = "treat",
   J = 3, data = affirm, method = "nls", robust = TRUE)
   

## End(Not run)


list documentation built on May 29, 2024, 11:57 a.m.