Introduction to heterogeneous effects analysis of conjoint experiments using cjbart
In cjbart: Heterogeneous Effects Analysis of Conjoint Experiments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, fig.height = 5
)

library(randomForestSRC)
options(rf.cores = 2)

set.seed(89)
library(ggplot2)
library(cjbart)

This vignette provides a brief example of how to estimate heterogeneous effects in conjoint experiments, using \code{cjbart}.

We first simulate a basic conjoint design for the purpose of illustration. Suppose we conducted a conjoint experiment on 100 individuals, where each individual makes 5 discrete choices between two profiles. Each profile has three attributes (A-E), and we also record two covariates for each subject. The resulting data contains 1000 observations.

subjects <- 100
rounds <- 5
profiles <- 2
obs <- subjects*rounds*profiles

fake_data <- data.frame(A = sample(c("a1","a2","a3"), obs, replace = TRUE),
                        B = sample(c("b1","b2","b3"), obs, replace = TRUE),
                        C = sample(c("c1","c2","c3"), obs, replace = TRUE),
                        D = sample(c("d1","d2","d3"), obs, replace = TRUE),
                        E = sample(c("e1","e2","e3"), obs, replace = TRUE),

                        covar1 = rep(runif(subjects, 0 ,1),
                                     each = rounds),
                        covar2 = rep(sample(c(1,0),
                                            subjects,
                                            replace = TRUE),
                                     each = rounds),

                        id1 = rep(1:subjects, each=rounds),
                        stringsAsFactors = TRUE)

fake_data$Y <- ifelse(fake_data$E == "e2",
                      rbinom(obs, 1, fake_data$covar1),
                      sample(c(0,1), obs, replace = TRUE))

Train a probit BART model on conjoint data

To estimate a conjoint model on this data we will use the cjbart() function, which uses Bayesian Additive Regression Trees (BART) to approximate the relationships between the outcome (a binary choice), the randomized attribute-levels, and the included covariates. For the sake of efficiency in this example, we reduce the number of trees (ntree) and cuts (numcut) used to estimate the BART model. More generally, users can pass arguments to the underlying BART algorithm (see ?BART::pbart) directly from the cjbart function to adjust the model hyperparameters:

cj_model <- cjbart(data = fake_data,
                   Y = "Y", 
                   id = "id1",
                   ntree = 15, numcut = 20)

To generate heterogeneity estimates at the observation- and individual-level, we use the IMCE() function. We pass in both the original experimental data, and the conjoint model generated in the last step. At this point, we also declare the names of the conjoint attribute variables (attribs) and the reference category we want to use for each attribute when calculating the marginal component effects (ref_levels). The vector of reference levels must be of the same length as attribs, with the level at position i corresponding to the attrib[i]:

het_effects <- IMCE(data = fake_data, 
                    model = cj_model, 
                    attribs = c("A","B","C","D","E"),
                    ref_levels = c("a1","b1","c1","d1","e1"),
                    cores = 1)

IMCE() returns an object of class "cjbart", which contains the individual-level marginal component effects (IMCEs), matrices reflecting the upper and lower values of each estimate's confidence interval respectively, and optionally the underlying observation-level marginal component effects (OMCEs) when keep_omces = TRUE.

It is likely that users will want to compare the conjoint estimation strategy to other estimation strategies, such as logistic regression models. We can recover the average marginal component effect (AMCE) for the BART model using the summary() command:

summary(het_effects)

We can plot the IMCEs, color coding the points by some covariate value, using the in-built plot() function:

plot(het_effects, covar = "covar1")

To aid presentation, the plot function can restrict which attribute-levels are displayed by using the plot_levels argument:

plot(het_effects, covar = "covar1", plot_levels = c("a2","a3","e2","e3"))

We can estimate the importance of covariates to the model using the het_vimp() function, which calculates standardized importance scores using random forest-based permutation tests. Calling plot() on the result will return a heatmap of these scores for each combination of attribute-level and covariate (users can restrict which attribute levels and covariates are considered, see the documentation for more details):

vimp_estimates <- het_vimp(imces = het_effects, cores = 1)
plot(vimp_estimates)

Supplying a single attribute-level, the same plot function will display the importance estimates with corresponding 95 percent confidence intervals:

plot(vimp_estimates, att_levels = "d3")

Finally, it is possible to estimate population IMCEs (pIMCEs) using pIMCE() This function requires a list of the marginal probabilities for each attribute in the population of interest, and can only be estimated for one attribute-level comparison at a time (due to the computational requirements):

fake_marginals <- list()
fake_marginals[["A"]] <- c("a1" = 0.33,"a2" = 0.33,"a3"=0.34)
fake_marginals[["B"]] <- c("b1" = 0.33,"b2" = 0.33,"b3" = 0.34)
fake_marginals[["C"]] <- c("c1" = 0.33,"c2" = 0.33, "c3" = 0.34)
fake_marginals[["D"]] <- c("d1" = 0.75,"d2" = 0.2,"d3" = 0.05)
fake_marginals[["E"]] <- c("e1" = 0.33,"e2" = 0.33,"e3" = 0.34)

# Reduced number of covariate data for sake of speed
fake_pimces <- pIMCE(covar_data = fake_data[fake_data$id1 %in% 1:3,
                                            c("id1","covar1","covar2")],
                     model = cj_model,
                     attribs = c("A","B","C","D","E"),
                     l = "E",
                     l_1 = "e2",
                     l_0 = "e1",
                     marginals = fake_marginals,
                     method = "bayes",
                     cores = 1)

head(fake_pimces)