CausalANOVA: Estimating the AMEs and AMIEs with the CausalANOVA.
In FindIt: Finding Heterogeneous Treatment Effects

Description Usage Arguments Details Value Author(s) References See Also Examples

CausalANOVA estimates coefficients of the specified ANOVA with regularization. By taking differences in coefficients, the function recovers the AMEs and AMIEs.

CausalANOVA(
  formula,
  int2.formula = NULL,
  int3.formula = NULL,
  data,
  nway = 1,
  pair.id = NULL,
  diff = FALSE,
  screen = FALSE,
  screen.type = "fixed",
  screen.num.int = 3,
  collapse = FALSE,
  collapse.type = "fixed",
  collapse.cost = 0.3,
  family = "binomial",
  cluster = NULL,
  maxIter = 50,
  eps = 1e-05,
  fac.level = NULL,
  ord.fac = NULL,
  select.prob = FALSE,
  boot = 100,
  seed = 1234,
  verbose = TRUE
)

`formula`	A formula that specifies outcome and treatment variables.
`int2.formula`	(optional). A formula that specifies two-way interactions.
`int3.formula`	(optional). A formula that specifies three-way interactions.
`data`	An optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.
`nway`	With `nway=1`, the function estimates the Average Marginal Effects (AMEs) only. With `nway=2`, the function estimates the AMEs and the two-way Average Marginal Interaction Effects (AMIEs). With `nway=3`, the function estimates the AMEs, the two-way and three-way AMIEs. Default is 1.
`pair.id`	(optional).Unique identifiers for each pair of comparison. This option is used when `diff=TRUE`.
`diff`	A logical indicating whether the outcome is the choice between a pair. If `diff=TRUE`, `pair.id` should specify a pair of comparison. Default is `FALSE`.
`screen`	A logical indicating whether select significant factor interactions with `glinternet`. When users specify interactions using `int2.formula` or `int3.formula`, this option is ignored. `screen` should be used only when users want data-driven selection of factor-interactions. With `screen.type`, users can specify how to screen factor interactions. We recommend to use this option when the number of factors is large, e.g., more than 6. Default is `FALSE`.
`screen.type`	Type for screening factor interactions. (1) `"fixed"` select the fixed number (specified by `screen.num.int`) of factor interactions. (2) `"cv.min"` selects factor-interactions with the tuning parameter giving the minimum cross-validation error. (3) `"cv.1Std"` selects factor-interactions with the tuning parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error.
`screen.num.int`	(optional).The number of factor interactions to select. This option is used when and `screen=TRUE` and `screen.type="fixed"`. Default is 3.
`collapse`	A logical indicating whether to collapse insignificant levels within factors. With `collapse.type`, users can specify how to collapse levels within factors. We recommend to use this option when the number of levels is large, e.g., more than 6. Default is `FALSE`.
`collapse.type`	Type for collapsing levels within factors. (1) `"fixed"` collapses levels with the fixed cost parameter (specified by `collapse.cost`). (2) `"cv.min"` collapses levels with the cost parameter giving the minimum cross-validation error. This option might take time. (3) `"cv.1Std"` collapses with the cost parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error. This option might take time.
`collapse.cost`	(optional).A cost parameter ranging from 0 to 1. 1 corresponds to no collapsing. The closer to 0, the stronger regularization. Default is 0.3.
`family`	A family of outcome variables. `"gaussian"` when continuous outcomes `"binomial"` when binary outcomes. Default is `"binomial"`.
`cluster`	Unique identifies with which cluster standard errors are computed.
`maxIter`	The number of maximum iteration for `glinternet`.
`eps`	A tolerance parameter in the internal optimization algorithm.
`fac.level`	(optional). A vector containing the number of levels in each factor. The order of `fac.level` should match to the order of columns in the data. For example, when the first and second columns of the design matrix is "Education" and "Race", the first and second element of `fac.level` should be the number of levels in "Education" and "Race", respectively.
`ord.fac`	(optional). Logical vectors indicating whether each factor has ordered (`TRUE`) or unordered (`FALSE`) levels. When levels are ordered, the function uses the order given by function `levels()`. If levels are ordered, the function places penalties on the differences between adjacent levels. If levels are unordered, the function places penalties on the differences based on every pairwise comparison.
`select.prob`	(optional). A logical indicating whether selection probabilities are computed. This option might take time.
`boot`	The number of bootstrap replicates for `select.prob`. Default is 50.
`seed`	Seed for bootstrap.
`verbose`	Whether it prints the value of a cost parameter used.

Regularization: screen and collapse.

Users can implement regularization in order to reduces false discovery rate and facilitates interpretation. This is particularly useful when analyzing factorial experiments with a large number of factors, each having many levels.

When screen=TRUE, the function selects significant factor interactions with glinternet (Lim and Hastie 2015) before estimating the AMEs and AMIEs. This option is recommended when there are many factors, e.g., more than 6 factors. Alternatively, users can pre-specify interactions of interest using int2.formula and int3.formula.
When collapse=TRUE, the function collapses insignificant levels within each factor by GashANOVA (Post and Bondell 2013) before estimating the AMEs and AMIEs. This option is recommended when there are many levels within some factors, e.g., more than 6 levels.

Inference after Regularization:

When screen=TRUE or collapse=TRUE, in order to make valid inference after regularization, we recommend to use test.CausalANOVA function. It takes the output from CausalANOVA function and estimate the AMEs and AMIEs with newdata and provide confidence intervals. Ideally, users should split samples into two; use a half for regularization with CausalANOVA function and use the other half for inference with test.CausalANOVA.
If users do not need regularization, specify screen=FALSE and collapse=FALSE. The function estimates the AMEs and AMIEs and compute confidence intervals with the full sample.

Suggested Workflow: (See Examples below as well)

Specify the order of levels within each factor using levels(). When collapse=TRUE, the function places penalties on the differences between adjacent levels when levels are ordered, it is crucial to specify the order of levels within each factor carefully.
Run CausalANOVA.
1. Specify formula to indicate outcomes and treatment variables and nway to indicate the order of interactions.
2. Specify diff=TRUE and pair.id if the outcome is the choice between a pair.
3. Specify screen. screen=TRUE to implement data-driven selection of factor interactions. screen=FALSE to specify interactions through int2.formula and int3.formula by hand.
4. Specify collapse. collapse=TRUE to implement data-driven collapsing of insignificant levels. collapse=FALSE to use the original number of levels.
Run test.CausalANOVA when select=TRUE or collapse=TRUE.
Run summary and plot to explore the AMEs and AMIEs.
Estimate conditional effects using ConditionalEffect function and visualize them using plot function.

`intercept`	An intercept of the estimated ANOVA model.If `diff=TRUE`, this should be close to 0.5.
`formula`	The `formula` used in the function.
`coefs`	A named vector of coefficients of the estimated ANOVA model.
`vcov`	The variance-covariance matrix for `coefs`. Only when `select=FALSE` and `collapse=FALSE`.
`CI.table`	The summary of AMEs and AMIEs with confidence intervals. Only when `select=FALSE` and `collapse=FALSE`.
`AME`	The estimated AMEs with the grand-mean as baselines.
`AMIE2`	The estimated two-way AMIEs with the grand-mean as baselines.
`AMIE3`	The estimated three-way AMIEs with the grand-mean as baselines.
`...`	arguments passed to the function or arguments only for the internal use.

Naoki Egami and Kosuke Imai.

Egami, Naoki and Kosuke Imai. 2019. Causal Interaction in Factorial Experiments: Application to Conjoint Analysis, Journal of the American Statistical Association. http://imai.fas.harvard.edu/research/files/int.pdf

Lim, M. and Hastie, T. 2015. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics 24, 3, 627–654.

Post, J. B. and Bondell, H. D. 2013. Factor selection and structural identification in the interaction anova model. Biometrics 69, 1, 70–79.

cv.CausalANOVA

data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
                            levels=c("YesLC", "YesDis","YesMP",
                                     "noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))

## ####################################### 
## Without Screening and Collapsing
## ####################################### 
#################### only AMEs ####################
fit1 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    data=Carlson, pair.id=Carlson$contestresp, diff=TRUE,
                    cluster=Carlson$respcodeS, nway=1)
summary(fit1)
plot(fit1)

#################### AMEs and two-way AMIEs ####################
fit2 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    int2.formula = ~ newRecordF:coeth_voting,
                    data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                    cluster=Carlson$respcodeS, nway=2)
summary(fit2)
plot(fit2, type="ConditionalEffect", fac.name=c("newRecordF","coeth_voting"))
ConditionalEffect(fit2, treat.fac="newRecordF", cond.fac="coeth_voting")

## Not run: 
#################### AMEs and two-way and three-way AMIEs ####################
## Note: All pairs within thee-way interactions should show up in int2.formula (Strong Hierarchy).
fit3 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    int2.formula = ~ newRecordF:promise + newRecordF:coeth_voting
                                       + promise:coeth_voting,
                    int3.formula = ~ newRecordF:promise:coeth_voting,
                    data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                    cluster=Carlson$respcodeS, nway=3)
summary(fit3)
plot(fit3, type="AMIE", fac.name=c("newRecordF","promise", "coeth_voting"),space=25,adj.p=2.2)

## End(Not run)

## ####################################### 
## With Screening and Collapsing
## #######################################
## Sample Splitting
train.ind <- sample(unique(Carlson$respcodeS), 272, replace=FALSE)
test.ind <- setdiff(unique(Carlson$respcodeS), train.ind)
Carlson.train <- Carlson[is.element(Carlson$respcodeS,train.ind), ]
Carlson.test <- Carlson[is.element(Carlson$respcodeS,test.ind), ]
 
#################### AMEs and two-way AMIEs ####################
fit.r2 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                      data=Carlson.train, pair.id=Carlson.train$contestresp,diff=TRUE,
                      screen=TRUE, collapse=TRUE,
                      cluster=Carlson.train$respcodeS, nway=2)
summary(fit.r2)

## refit with test.CausalANOVA
fit.r2.new <- test.CausalANOVA(fit.r2, newdata=Carlson.test, diff=TRUE,
                               pair.id=Carlson.test$contestresp, cluster=Carlson.test$respcodeS)

summary(fit.r2.new)
plot(fit.r2.new)
plot(fit.r2.new, type="ConditionalEffect", fac.name=c("newRecordF","coeth_voting"))
ConditionalEffect(fit.r2.new, treat.fac="newRecordF", cond.fac="coeth_voting")