In cknotz/conjointdatachecks: An R Package to Check Data from Conjoint Experiments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The purpose of this package

Conjoint survey experiments have become a popular tool among political scientists since Hainmueller et al. introduced the methodology in their 2014 article in Political Analysis and the associated cjoint package for R.

Thomas Leeper's cregg package then further expanded the toolkit for conjoint analysis, notably by providing functions to check the quality of data from conjoint experiments. One data quality problem that can arise are carryover effects. These arise when respondents go through several rounds in which they evaluate or choose vignettes, and when their ratings or choices in later rounds are influenced by what they saw in earlier rounds. A related problem are profile order effects which can appear when respondents compare two vignettes side-by-side and the effects of attributes differ depending on whether they were shown on the right- or left-hand-side. A third potential problem that can appear are randomization problems. These result when the random assignment of vignettes to respondents does not work and, say, male respondents were more likely to see some types of vignettes than female respondents. cregg allows users to identify these problems via a graphical inspection of the data.

However, often users also want to do formal statistical tests (or journal reviewers demand them). Hainmueller et al. (2014) describe procedures to conduct such formal tests, but these have -- to my knowledge -- not yet conveniently implemented in a package. Experienced users should of course not have any issues with running these tests themselves, but some others might.

This package provides two functions to quickly implement tests for carryover effects and randomization problems as well as methods for plotting and tidy exporting of the results.

Functions in `conjointdatachecks`

This package includes two functions so far (more will be added as time allows):

carryTest() performs a test for carryover effects, but can also be used to test for profile-order effects
dimRandoTest() performs a test for randomization problems

carryTest() in essence implements the procedure described by Hainmueller et al. (2019, 22): It estimates linear regression models (with cluster-robust standard errors, clustering by respondent) including one vignette attribute/dimension at a time interacted with an ID for the task number as predictors and then estimates an F-test to test for the joint significance of the interaction terms. A rejected null indicates carryover effects are present in the case of a particular vignette attribute. This function relies on the clubSandwich package.

Important: The carryTest() function should not be used on vignette attributes that are constrained by some other vignette attributes (where some combinations of attributes were excluded in the experimental design). Since carryTest() includes only one attribute at a time, it cannot account for attribute effects that depend on the values of other attributes.

dimRandoTest() tests if there are significant bivariate associations between a set of vignette attributes and a chosen respondent variable. The test differs depending on whether the respondent variable is metric or categorical. If the variable is metric, (e.g., income), the function runs a one-way ANOVA with the respondent-level variable as the dependent and each individual vignette attribute in a chosen set as the independent variable and compiles the results in a table. If the respondent-level variable is categorical (e.g., gender), the function cross-tabulates each vignette attribute with the respondent-level variable, computes a chi-squared test, and compiles the results in a table.

Both functions have custom S3 methods to quickly plot the results with plot() and to export them to a tidy data.frame for more customizable tabulation (e.g., with xtable) or plotting (e.g., with ggplot2).

Example analysis using data from the Hainmueller et al. immigration attitudes experiment

To show conjointdatachecks in action, I use the now well-known replication data from the Hainmueller et al. immigration attitudes experiment (see 2014, or also Hainmueller & Hopkins 2015).

First, the conjointdatachecks package can be installed from GitHub:

devtools::install_github("https://github.com/cknotz/conjointdatachecks")

The code chunk below then loads the conjointdatachecks package and the dataset from the cjoint package (which obviously needs to be installed as well). It also does some simple data transformations, and it saves the vignette attributes as a vector.

library(conjointdatachecks)

# Loading Hainmueller et al. immigrant experiment data (from cregg GitHub)
utils::download.file("https://github.com/leeper/cregg/raw/main/data/immigration.rda",
              "immigration")
load("immigration")

# Convert ID vars to factors
immigration$contest_no <- factor(immigration$contest_no)
immigration$CaseID <- factor(immigration$CaseID)
immigration$profile <- factor(immigration$profile)


# Select vignette attributes ("dimensions")
vigdims <- c("Gender","JobExperience",
             "JobPlans","PriorEntry","LanguageSkills")

Testing for carryover effects

The carryTest() function can be used as shown below:

carryTest(data = immigration,
          outcome = "ChosenImmigrant",
          attributes = "LanguageSkills",
          task = "contest_no",
          resID = "CaseID")

Here we test for carryover effects looking only at one particular vignette attribute, the fictional immigrant's language skills. As Hainmueller et al., the finding is that no carryover effects are present, but the p-value is slightly different (.48 instead of .52). This is due to a different model and test specification, and is shown in more detail in the replication vignette.

To run the test over all vignette attributes, we simply refer to the vigdims vector created above:

carry <- carryTest(data = immigration,
          outcome = "ChosenImmigrant",
          attributes = vigdims,
          task = "contest_no",
          resID = "CaseID")
carry

Interestingly, we now find a significant test in the case of the "Job Experience" attribute. A closer inspection (using the functionality of the cregg-package) would reveal that there seem to be indeed some differences in the AMCEs over the different tasks -- but nothing systematic (not shown). Thus, there is no reason to suggest that the Hainmueller et al. findings are incorrect, but this nevertheless highlights the importance of going over all vignette dimensions to avoid overlooking a significant effect.

The results of the carryTest function can also be plotted and exported to a tidy data.frame (for more customizable plotting or tabulating with ggplot2 or xtable for example) using custom plot() and as.data.frame() functions.

To plot the results, we simply run plot() on the stored results:

plot(carry,
     ltype = 5, # dashed line
     lcol = "red",# turn line red
     margins = c(5,8,1,2)) # adjust plot margins)

Exporting to a data.frame works the same way:

as.data.frame(carry)

Testing for profile order effects

Since a test for profile order effects also involves a linear hypothesis F-test, the carryTest function can also be used to test for profile order effects. To do so, we simply replace the task-ID by an indicator for the profile position:

carryTest(data = immigration,
                              outcome = "ChosenImmigrant",
                              attributes = vigdims,
                              task = "profile", # <- replaced here
                              resID = "CaseID")

Here we find significant effects in the cases of the "Gender" attribute. Looking at the example included in the vignette for the cregg package, some differences in the effects are visible (in the case of male immigrants but there is again no clear evidence for very strongly systematic patterns in the data that could give rise to concerns.

Testing for randomization problems

Since survey experiments are nowadays often administered online and the random assignment of experimental conditions is left to computers, randomization problems are not that likely to happen. Still, it can be important to check -- for example, where a survey and the experiment(s) included in it are administered not by researchers themselves but by external providers, researchers might want to run a quick initial check on the data to be able refuse (and avoid paying for!) problematic data.

dimRandoTest() makes it possible to do this quickly with data from conjoint experiments. The function runs fairly simple statistical tests for bivariate associations between a chosen respondent-level variable and a set of vignette attributes. The tests are as follows: If the respondent-level variable is metric, an ANOVA is estimated. If the respondent-level variable is categorical, the function uses a chi-squared test. In the latter case, the function will also add a note if there are cases where a chi-squared test is run and the minimum expected frequency of any cell in the corresponding contingency table is less than 5.

Important: If the experiment involved multiple rounds of rating or choice tasks per respondents, the test should be run one round at a time since the estimations do not take into account clustering.

If we are working with a metric respondent-level variable -- e.g., the ethnocentrism measure in the Hainmueller et al. experiment -- we declare this in the call to dimRandoTest() as shown below:

result <- dimRandoTest(data = immigration[which(immigration$contest_no==1),],
                        attributes = vigdims, # we use the same vignette attr. as before
                        resvar = "ethnocentrism",
                        vartype = "metric")

result

The note under the printed results confirms that the appropriate test is run, an ANOVA. More importantly, we do not find any significant associations: The type of vignettes respondents saw is entirely uncorrelated with their reported ethnocentrism.

To illustrate how dimRandoTest() works with a categorical respondent-level variable, we can dummy-code the ethnocentrism variable and then run the test with this variable:

# Split ethnocentrism (by median, following Hainmueller/Hopkins 2015)
immigration$ethno_bin <- factor(ifelse(immigration$ethnocentrism>median(immigration$ethnocentrism, na.rm = T),
                                        "High","Low"))

dimRandoTest(data = immigration[which(immigration$contest_no==1),],
                        attributes = vigdims, # we use the same vignette attr. as before
                        resvar = "ethno_bin",
                        vartype = "categorical")

The result is the same as before: We find no randomization problems.

As in the case of carryTest(), the results from dimRandoTest() can be quickly plotted:

result <- dimRandoTest(data = immigration[which(immigration$contest_no==1),],
                        attributes = vigdims, # we use the same vignette attr. as before
                        resvar = "ethno_bin",
                        vartype = "categorical")

plot(result,
     margins = c(5,8,2,2))

Alternatively, the results can also be exported to a tidy data.frame for custom tabulation or plotting:

tidyresult <- as.data.frame(result)
tidyresult

Next steps

The current version of the cjointdatachecks package is the first -- and it should be developed further. Any questions, comments, or suggestions for further improvement are more than welcome. Feel free to write to carlo.knotz@uis.no.

cknotz/conjointdatachecks documentation built on Feb. 11, 2023, 12:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com