groupSearch: groupSearch
In NickCH-K/MagnifiedIV: Magnified Instrumental Variables Estimator

Description Usage Arguments Details Examples

View source: R/groupSearch.R

This function performs the GroupSearch algorithm as in Huntington-Klein (2019) "Instruments with Heterogeneous Effects: Monotonicity, Bias, and Localness."

groupSearch(
  formula,
  data,
  weights,
  ngroups = 4L,
  ntries = 100L,
  id = NULL,
  silent = FALSE,
  ...
)

`formula`	A formula of the form `x ~ z \| w1 + w2` where `x` is an endogenous variable in an instrumental variables model, being predicted here. `z` is one of the instruments, and `w1` and `w2`, etc., are covariates to be partialed out, if any.
`data`	A data.frame.
`weights`	Estimation weights.
`ngroups`	Number of groups to split the data into.
`ntries`	Number of groupings to attempt.
`id`	A variable in `data` that indiates that observations with the same value of `id` should always be in the same group.
`silent`	Suppress the progress report.
`...`	Additional arguments to be passed to `lm()`. Note that `na.action` will be ignored for partialling-out of the covariates, and if you prefer a different `na.action` for this purpose you should partial-out by hand before running `groupSearch`.

The GroupSearch algorithm is naive. It simply tries a bunch of random groupings, and for each grouping attempts to predict x with z. It returns the grouping that produces the highest F-statistic as a factor vector. Be aware before using groupSearch() that in the original paper it only did a mediocre job at picking up effect heterogeneity. You may want to use groupCF() instead.

This function is called by magnifiedIV. You can also run Magnified IV by yourself without the magnifiedIV function (with any estimator) by running groupSearch, then adding the resulting group variable as a control in both IV stages and also interacted with the instrument. Or use factorPull() to get the individual-level effects estimates and use those to construct a sample weight.

# Get data
data(CPS1985, package = 'AER')

# Split the data into 10 random groups 100 times, and each time see how the effect of
# education in predicting wages varies across the sample, after controlling
# for all the other variables in the data, plus a squared term on experience.
# Return the group with the largest resulting F statistic.
edeffect <- groupSearch(wage ~ education |
                          experience + I(experience^2) + age + ethnicity +
                          region + gender + occupation +
                          sector + union + married,
                        data = CPS1985, ngroups = 10)

table(edeffect)