In cnchapman/choicetools: Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(choicetools)

Introduction to Choice-Based Conjoint Analysis

In this vignette, we demonstrate a simple choice-based conjoint (CBC) analysis project, end to end, using the choicetools package.

Choice-based conjoint is a variety of conjoint analysis in which respondents choose among several (typically 2-4) products that differ in various attributes such as brand, performance, and price. Each attribute will have two or more levels, such as brand name or specific prices. A complete product (aka 'concept') is a fully specified set of attributes and levels (Chapman & Feit, 2019).

For example, a Ford Truck with 300 HP engine at $35000 is an example of a product that 'conjoins' four attributes with specific levels: body style (with the level 'truck'), brand (Ford), engine size (300 HP), and price ($35000).

By randomizing the products' attributes and asking for repeated selection of preference, we can estimate the relative value (aka utility) of each attribute level. For instance, we can determine whether the engine size is more important than brand. We can also determine willingness to pay. For example, we can match the utility scores for engine size to those for price, to see how much more we might expect respondents to pay for a specific engine size.

Each respondent is asked to choose among a specific number of concepts (products) at a time, such as their preference among 3 choices. They do this repeatedly, typically 8-12 times (aka tasks).

The CBC tools here are probably most useful for didactic purposes --- to demonstrate and explain conjoint analysis in a classroom setting. A complete conjoint analysis study can be run with this code and a simple spreadsheet editing program, such as Google Sheets or Microsoft Office 365. (See notes at the end for details on that.) Beyond that, many CBC studies with simple design characteristics can be estimated using the tools.

Setup for a CBC study

In this vignette, we will imagine a hypothetical USB flash drive with five attributes: Brand (4 choices), Style (4 choices), Price (5 price points), Color (2 choices), and Size (4 choices). Further, we assume that we will ask for preference among 3 concepts at a time, asking each respondent to judge 12 concepts (sets of concepts), with a total of N=400 respondents

We set up the study by defining those parameters:

set.seed(98103)

cbc.tasks     <- 12   # trials per respondent
cbc.concepts  <- 3    # cards per trial
N             <- 400  # N of respondents

cbc.attrs     <- c(Brand=4, Style=4, Price=5, Color=2, Size=4)

Next, we can assign labels for the attribute levels. In the present code, this is a simple list, where you should ascertain that they are ordered correctly. We specify brands of "Alpha", "Bravo", and so forth; hypothetical
design styles of "Shiny", "Flat", etc; and the other attributes as follow:

cbc.levels    <- c("Alpha", "Bravo", "Charlie", "Delta",    # Brand
                   "Shiny", "Flat",  "Sparkly", "Odd",      # Style
                   "$9",  "$14",  "$19",  "$24",  "$29",    # Price
                   "Blue",  "Red",                          # Color
                   "8GB", "16GB", "32GB", "64GB")           # Size

We can create a product concept by randomly selecting one level for each of the attributes. Thus, we might have an Alpha brand Shiny 16GB Blue USB drive for \$19, or a Charlie Flat design, Red 64GB device for \$9. Each task in the survey will present 3 such randomly constructed concepts and ask the respondent to select the single concept that is most preferred.

Now we create an experimental design matrix for the study. The design matrix specifies which randomly-generated concepts appear together. To ensure the design is balanced and unbiased, we want each level to appear approximately the same number of times, and approximately the same number in combination with every other attribute. This is done with the generalMNLrandomTab() function:

cbc.tab <- generateMNLrandomTab(cbc.attrs, respondents=N, 
                                cards=cbc.concepts, trials=cbc.tasks )
knitr::kable(head(cbc.tab))

This function iteratively searches randomized designs and retains a design if it has improved balance vs. the previous best design. The resulting design matrix specifies the level of each attribute that might be shown on the survey.

In most cases, you would obtain this design matrix from the survey platform in which the survey was fielded. For instance, Sawtooth Software can export the "tab design" matrix in a compatible format. In that case, you do not need to generate a design matrix here; use that one instead. We generate the design here in order to field a survey for didactic purposes; see below.

We can also convert the "tab style" layout to a fully-extended, dummy coded matrix of 0 and 1 entries:

cbc.des <- convertSSItoDesign(cbc.tab)     # dummy coded matrix
knitr::kable(head(cbc.des, 9))

In this case, we see that the first product concept is a combination of Brand 2, Style 3, Price level 2, and so forth.

Given the design matrix, we can see how the survey choice tasks might appear (with minimal instruction). writeCBCdesignCSV() will produce a minimal "survey" in CSV format:

writeCBCdesignCSV(head(cbc.tab, 6), cards=3, trials=2, 
                  attr.list=cbc.attrs, lab.attrs=names(cbc.attrs),
                  lab.levels = cbc.levels)

By "fielding" such a survey in a spreadsheet editor --- such as Google Sheets or Microsoft Office --- we can easily demonstrate a CBC survey in a classroom setting. (To design a better looking and more robust survey for real respondents, use a survey authoring platform such as Sawtooth Software or Qualtrics.) We use the digest package (Eddelbuettel et al, 2018) to include a hash value for the design matrix as noted at the top of the CSV. This helps ensure that the collected data match the design matrix when choices are imported.

Aggregate Logit Model Estimation

The simplest form of estimation is an aggregate multinomial logit model (MNL). This estimates a simple multinomial (choice among options) model for the entire sample. The estimated coefficients are known as part worths ("part" because the values for multiple attributes are added together for the full utility of a concept). In an MNL model, the part worths sum to zero across the levels of each attribute.

To see how this works, we use generateRNDpws() to make up some zero-sum part worths, and then pickMNLwinningCards() to find the preferred concept based on those part worths for each choice set:

cbc.pws <- generateRNDpws(cbc.attrs)    # make up some zero-sum part worths

cbc.win <- pickMNLwinningCards(cbc.des, cbc.pws)          
knitr::kable(head(cbind(cbc.win, cbc.des), 9))

In the first set of choices -- the first three rows -- the second concept (Brand 1, Style 2, Price 3, etc.) was preferred and selected in the cbc.win vector.

Now we estimate the part worths based on those "observed" choices, and plot those estimates against the original part worths. We see a near perfect recovery of the part worth values (aside from minor rescaling of the magnitude):

cbc.mnl <- estimateMNLfromDesign(cbc.des, cbc.win, cards=cbc.concepts)
plot(cbc.pws, cbc.mnl)

Write CSV Structure and Read Choices from It

Of course it is more interesting to collect data from real individuals! Most of the time one would do that by fielding a survey online using survey authoring tools such as Sawtooth Software or Qualtrics. Such tools are able to display CBC tasks in a friendly way and require users to make tradeoffs among concepts.

For this vignette, however, we will write out the CSV file, and then read the data back:

csv.filename <- "~/Downloads/testCBC.csv"
writeCBCdesignCSV(cbc.tab, filename=csv.filename,   # filename="" for console
                  cards=3, trials=12, 
                  attr.list=cbc.attrs, lab.attrs=names(cbc.attrs),
                  lab.levels = cbc.levels, overwrite=TRUE)

# read the CSV 
csvfile.in  <- readLines(csv.filename)

If respondents had filled in choices, we could estimate the MNL model from them. Because they have not, we will make some random choices and rewrite the CSV:

# Fill in all of the choices with random choices (1, 2, or 3)
lines.with.choices <- which(grepl("CHOICE for Trial [0-9]+", csvfile.in))
csvfile.in[lines.with.choices] <- paste(csvfile.in[lines.with.choices], 
                                        sample(cbc.concepts, 
                                               length(lines.with.choices), 
                                               replace = TRUE))
writeLines(csvfile.in, con=csv.filename)

Now we have a CSV that might have been completed by respondents (if they were answering randomly). We read those data using readCBCchoices() with the design matrix (cbc.tab) and the filename. readCBCchoices() will check that the design hash for the file matches the design matrix that we give it.

# get those choices
cbc.choices <- readCBCchoices(cbc.tab, filename=csv.filename, 
                              cards=3, trials=12, verbose=FALSE)

We use estimateMNLfromDesign() to estimate the part worths from the observed data. We plot those and see that -- because the choices were assigned randomly -- the estimates are mostly close to zero, and have no correspondence to the original part worths:

# estimate partworths (should be near 0 because all choices were random)
cbc.mnl2 <- estimateMNLfromDesign(cbc.des, cbc.choices, cards=cbc.concepts,
                                  no.output = TRUE)
plot(cbc.pws, cbc.mnl2)
abline(h=0)

It is important to note that the estimateMNLfromDesign() function is not intended for production usage; it is primarily a didactic function whose NR algorithm is easy to explain (and to read in code). For actual analyses, we would use hierarchical Bayes estimation instead (Rossi, Allenby, and McCulloch, 2005).

Hierarchical Bayes estimation

Hierachical Bayes estimation is typically used for choice models to estimate a mixed effects model. The upper level comprises fixed effects estimates for the sample, while the lower level represents estimates for each individual respondent within the overall distribution. The individual-level estimates are based on the group of multiple observations for each respondent.

If you fielded the survey online, you would commonly receive responses as 1/0 for each row, indicating whether the item was chosen (similar to our cbc.win vector above). For exposition here, we will use the winning cards that were selected on the basis of our simulated part worths, given in cbc.win, but add a degree of randomness.

To estimate the hierarchical Bayes model, use the function estimateMNLfromDesignHB(). This function is primarily a wrapper that simplifies the data setup and calls ChoiceModelR::choicemodelr() (Sermas, 2012). Note that we use the tab format for the design matrix (as opposed to the 1/0 dummy coded, expanded version).

This is an iterative MCMC estimation routine. For speed, we specify 2000 total MCMC draws (burn-in and posterior); in practice, this would typically be 10000s, as the MCMC process will need a longer burn-in period for convergence. estimateMNLfromDesignHB() includes optional arguments to specify the proportion of the MCMC chain that is regarded as posterior draws (pitersUsed), whether to save the draws for each respondent (drawKeep) and the proportion of posterior draws to retain (skipping them in the sequence to avoid autocorrelation; drawKeepK).

# replace 30% of the perfect "winning" vector with random draws
cbc.win2    <- cbc.win
cbc.replace <- sample(length(cbc.win2), length(cbc.win2)*0.3)  # to replace
cbc.win2[cbc.replace] <- sample(3, length(cbc.replace), replace=TRUE)

# estimate using the design and winning cards
cbc.hb <- estimateMNLfromDesignHB(tmp.des=cbc.tab, tmp.win=cbc.win2, 
                                  kCards=cbc.concepts, kTrials=cbc.tasks,
                                  kResp=N , mcmcIters=2000)

The MCMC process shows a trace plot of the estimates for each attribute. The estimated part worths may be extracted from the HB model with the extractHBbetas() function. We extract those, add respondent IDs, and plot the values (Wickham, 2016; Wilke, 2018). We can plot each respondent as a separate faint point, allowing us to see the distribution of estimates; this is especially helpful when there are groups of respondents who show differing sets of preferences:

cbc.est        <- data.frame(extractHBbetas(cbc.hb, cbc.attrs))
names(cbc.est) <- cbc.levels
cbc.est$ID     <- 1:nrow(cbc.est)

library(ggplot2)
library(reshape2)
cbc.m <- melt(cbc.est, id.vars = "ID")

library(ggridges)
ggplot(data=cbc.m, aes(x=value, y=variable, group=variable)) +
      geom_density_ridges(scale=0.9, alpha=0, jittered_points=TRUE,
                          rel_min_height=0.005,
                          position="points_sina",
                          point_color = "blue", point_alpha=1/sqrt(N),
                          point_size=2.5) +
        ylab("Item") + 
        xlab("Relative preference (blue circles=individuals)") +
        ggtitle("Preference estimates: Distribution of individuals")

The ggridges plot is very useful to demonstrate the distribution of individuals in HB estimates. If these were real data, we could interpret them as showing a strong preference for Sparkly design, Red color, $14 price point, and so forth.

Estimated share of preference (aka "market share") may be estimated using the marketSim() function. Suppose we wanted to assess preference for a Flat 8GB Blue drive at \$9 from Alpha, vs. an Odd 64GB Red drive at \$24 from Bravo. We specify the sets of attributes for each product and compare them. In this case, we will use first choice preference, where each individual is regarded as "purchasing" the item with highest preference; other options in marketSim() include share of preference at the individual level, and first choice with randomization:

# cbc.levels    <- c("Alpha", "Bravo", "Charlie", "Delta",    # Brand 1-4
#                    "Shiny", "Flat",  "Sparkly", "Odd",      # Style 5-8
#                    "$9",  "$14",  "$19",  "$24",  "$29",    # Price 9-13
#                    "Blue",  "Red",                          # Color 14-15
#                    "8GB", "16GB", "32GB", "64GB")           # Size 16-19
prod1 <- c(6, 16, 14,  9, 1)     # Flat 8GB Blue $9 Alpha
prod2 <- c(8, 19, 15, 12, 2)     # Odd 64GB Red $24 Bravo
usb.pref  <- marketSim(
  cbc.est,                    # matrix of individual-level utilities
  list(prod1, prod2),         # list of products to compare
  use.none=FALSE,             # we have no "none" column
  style="first")              # estimate share by first-choice approach

# see the overall preference share for the two products
colMeans(usb.pref)

Between just these two products, we estimate that r round(colMeans(usb.pref)[1]*100)% of respondents would prefer product 1, the $9 flat 8GB drive from Alpha.

Classroom Usage

To use this package in a classroom setting, I suggest the following:

Cover the basic concepts of choice-based conjoint analysis
Show product attributes and how they are randomly selected for concept trials
Write out a CSV file, and assign respondent numbers to students
Share the CSV and have students complete their blocks in the shared document
Download the spreadsheet CSV, read the answers, and estimate the results
Share the results and discuss them

In classroom settings, I usually find that the individual level plots lead to interesting discussion. Disagreement, such as one respondent feeling that that her answers are not reflected in the distribution, is rare. Such discussion is often helpful to answer questions about the efficacy of choice modeling methods.

Features Beyond CBC

The choicetools package includes support for many other features of CBC models and related marketing analyses that are beyond the scope of this vignette. Those include:

MaxDiff / Best-Worst Scaling, with support to import models from Sawtooth Software and Qualtrics. Unlike the largely didactic support for CBC models, the MaxDiff features are intended to be production quality. Cf. Bahna & Chapman (2018).
Composite Perceptual Maps for brand positioning.
Experimental CBC models to assess attribute importance. With inspiration from random forest variable importance methods, this method omits an attribute and examines the change in predictive validity under oblation to determine the importance of that attribute.

References

Bahna, E., and Chapman, CN (2018). Constructed, Augmented MaxDiff. In B. Orme, ed., Proceedings of the 2018 Sawtooth Software Conference, Orlando, FL.

Chapman, CN, and Feit, EMF (2019). R for Marketing Research and Analytics, 2nd ed. Chapter 13: Choice Modeling. New York: Springer.

Eddelbuettel, Dirk, with contributions by Antoine Lucas, Jarek Tuszynski, Henrik Bengtsson, Simon Urbanek, Mario Frasca, Bryan Lewis, Murray Stokely, Hannes Muehleisen, Duncan Murdoch, Jim Hester, Wush Wu, Qiang Kou, Thierry Onkelinx, Michel Lang, Viliam Simko, Kurt Hornik and Radford Neal. (2018). digest: Create Compact Hash Digests of R Objects. R package version 0.6.18. https://CRAN.R-project.org/package=digest

Rossi, PE, Allenby, GM, and McCulloch, RE (2005). Bayesian Statistics and Marketing. New York: Wiley.

Sermas, Ryan (2012). ChoiceModelR: Choice Modeling in R. R package version 1.2. https://CRAN.R-project.org/package=ChoiceModelR

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag.

Wilke, CO. (2018). ggridges: Ridgeline Plots in 'ggplot2'. R package version 0.5.1. https://CRAN.R-project.org/package=ggridges

cnchapman/choicetools documentation built on May 28, 2023, 9:14 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cnchapman/choicetools
Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys

In cnchapman/choicetools: Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys

Introduction to Choice-Based Conjoint Analysis

Setup for a CBC study

Aggregate Logit Model Estimation

Write CSV Structure and Read Choices from It

Hierarchical Bayes estimation

Classroom Usage

Features Beyond CBC

References

R Package Documentation

Browse R Packages

We want your feedback!

cnchapman/choicetools Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys

In cnchapman/choicetools: Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys

Introduction to Choice-Based Conjoint Analysis

Setup for a CBC study

Aggregate Logit Model Estimation

Write CSV Structure and Read Choices from It

Hierarchical Bayes estimation

Classroom Usage

Features Beyond CBC

References

R Package Documentation

Browse R Packages

We want your feedback!

cnchapman/choicetools
Tools for Choice Modeling, Conjoint Analysis, and MaxDiff analysis of Best-Worst Surveys