diffs: Preference Heterogeneity Diagnostics
In leeper/cregg: Simple Conjoint Tidying, Analysis, and Visualization

Description Usage Arguments Details Value Author(s) See Also Examples

Tests for preference heterogeneity in conjoint experiments

amce_diffs(
  data,
  formula,
  by,
  id = ~0,
  weights = NULL,
  feature_order = NULL,
  feature_labels = NULL,
  level_order = c("ascending", "descending"),
  alpha = 0.05,
  ...
)

cj_anova(data, formula, id = NULL, weights = NULL, by = NULL, ...)

mm_diffs(
  data,
  formula,
  by,
  id = ~0,
  weights = NULL,
  feature_order = NULL,
  feature_labels = NULL,
  level_order = c("ascending", "descending"),
  alpha = 0.05,
  h0 = 0,
  ...
)

`data`	A data frame containing variables specified in `formula`. All RHS variables should be factors; the base level for each will be used in estimation and for AMCEs the base level's AMCE will be NA. Optionally, this can instead be an object of class “survey.design” returned by `svydesign`.
`formula`	A formula specifying a model to be estimated. All variables should be factors; all levels across features should be unique.
`by`	A formula containing only RHS variables, specifying grouping factors over which to perform estimation. For `amce_diffs`, this can be a factor or something coercable to factor. For `mm_diffs`, differences are calculated against the base level of this variable.
`id`	Ignored.
`weights`	An (optional) RHS formula specifying a variable holding survey weights.
`feature_order`	An (optional) character vector specifying the names of feature (RHS) variables in the order they should be encoded in the resulting data frame.
`feature_labels`	A named list of “fancy” feature labels to be used in output. By default, the function looks for a “label” attribute on each variable in `formula` and uses that for pretty printing. This argument overrides those attributes or otherwise provides fancy labels for this purpose. This should be a list with names equal to variables on the righthand side of `formula` and character string values; arguments passed here override variable attributes.
`level_order`	A character string specifying levels (within each feature) should be ordered increasing or decreasing in the final output. This is mostly only consequential for plotting via `plot.cj_mm`, etc.
`alpha`	A numeric value indicating the significance level at which to calculate confidence intervals for the MMs (by default 0.95, meaning 95-percent CIs are returned).
`...`	Additional arguments to `amce`, `cj_freqs`, or `mm`.
`h0`	A numeric value specifying a null hypothesis value to use when generating z-statistics and p-values (only used for `mm_diffs`).

cj_anova takes a model formula (“reduced” model) and generates a “full” model with two-way interactions between the variables specified in by and all RHS variables in formula, then computes an F-test comparing the two models, providing a test for whether preferences vary across levels of by. This is, in essence, a test of whether all such interaction coefficients are distinguishable from zero. (Because the test depends on overall model fit, not the coefficient variances, clustering is irrelevant.)

mm_diffs provides a data frame of differences in marginal means (literally differencing the results from mm across levels of by. This provides the clearest direct measure of preference differences from a conjoint design.

amce_diffs provides a data frame of differences in AMCEs (the coefficient on an interaction between each RHS factor and the variable in by). This provides an estimate of the difference in causal effects of each factor level relative to the baseline level (i.e., the difference in conditional AMCEs). This quantity is easily misinterpreted as the difference in preferences, which it is not. Rather it is a difference in the effect of the factor on preferences relative to the baseline/reference category of that feature. If preferences in the reference category differ across levels of by, the the difference in conditional AMCEs will have an unpredictable sign and significance, making differences in marginal means a more sensible quantity of interest. See amce_by_reference for a diagnostic.

Note: amce_diffs does not work with constrained designs. To obtain such differences, subset the design by constraints and calculate differences within each subset.

amce_diffs and mm_diffs return a data frame similar to the one returned by cj, including a BY column (with the value “Difference”) for easy merging with results returned by that function.

cj_anova returns an anova object.

Thomas J. Leeper <thosjleeper@gmail.com>

amce mm cj_freqs plot.cj_amce

data("immigration")
immigration$contest_no <- factor(immigration$contest_no)
# Test for heterogeneity by profile order
cj_anova(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, by = ~ contest_no)

# Test for heterogeneity by CountryOfOrigin feature
cj_anova(immigration, ChosenImmigrant ~ Gender + Education, by = ~ CountryOfOrigin)


# Differences in MMs by Gender feature
mm_diffs(immigration, ChosenImmigrant ~ LanguageSkills + Education, ~ Gender, id = ~ CaseID)

# Differences in AMCEs by Gender feature (i.e., feature interactions)
amce_diffs(immigration, ChosenImmigrant ~ LanguageSkills + Education, ~ Gender, id = ~ CaseID)


# preferences differ for Male and Female immigrants with 'Broken English' ability
(m1 <- mm_diffs(immigration, ChosenImmigrant ~ LanguageSkills, ~ Gender, id = ~ CaseID))

# yet differences in conditional AMCEs  depend on the reference category
amce_diffs(immigration, ChosenImmigrant ~ LanguageSkills, ~ Gender, id = ~ CaseID)
immigration$LanguageSkills2 <- relevel(immigration$LanguageSkills, "Used Interpreter")
amce_diffs(immigration, ChosenImmigrant ~ LanguageSkills2, ~ Gender, id = ~ CaseID)

# while differences in MMs do not depend on the reference cateory
(m2 <- mm_diffs(immigration, ChosenImmigrant ~ LanguageSkills2, ~ Gender, id = ~ CaseID))