penhdfeppml_cluster: Plugin Lasso Estimation

View source: R/wrappers.R

penhdfeppml_clusterR Documentation

Plugin Lasso Estimation

Description

Performs plugin lasso - PPML estimation with HDFE. This is an internal function, called by mlfitppml and penhdfeppml when users select the method = "plugin" option, but it's made available as a stand-alone option for advanced users who may prefer to avoid some overhead imposed by the wrappers.

Usage

penhdfeppml_cluster(
  data,
  dep = 1,
  indep = NULL,
  fixed = NULL,
  cluster = NULL,
  selectobs = NULL,
  ...
)

Arguments

data

A data frame containing all relevant variables.

dep

A string with the name of the independent variable or a column number.

indep

A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.

fixed

A vector with the names or column numbers of factor variables identifying the fixed effects, or a list with the desired interactions between variables in data.

cluster

A string with the name of the clustering variable or a column number. It's also possible to input a vector with several variables, in which case the interaction of all of them is taken as the clustering variable. Note that this is NOT OPTIONAL in this case: our plugin algorithm requires clusters to be specified.

selectobs

Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).

...

Further options. For a full list of options, see penhdfeppml_cluster_int.

Details

This function is a thin wrapper around penppml_cluster_int, providing a more convenient interface for data frames. Whereas the internal function requires some preliminary handling of data sets (y must be a vector, x must be a matrix and fes must be provided in a list), the wrapper takes a full data frame in the data argument, and users can simply specify which variables correspond to y, x and the fixed effects, using either variable names or column numbers.

The plugin method uses coefficient-specific penalty weights that account for heteroskedasticity. The penalty parameters are calculated automatically by the function using statistical theory - for a brief discussion of this, see Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2021), and for a more in-depth analysis, check Belloni, Chernozhukov, Hansen, and Kozbur (2016), which introduced the specific implementation used in this package. Heuristically, the penalty parameters are set at a level high enough so that the absolute value of the score for each regressor must be statistically large relative to its standard error in order for the regressors to be selected.

Value

An object of class elnet with the elements described in glmnet, as well as the following:

  • mu: a 1 x length(y) matrix with the final values of the conditional mean \mu.

  • deviance.

  • bic: Bayesian Information Criterion.

  • phi: coefficient-specific penalty weights.

  • x_resid: matrix of demeaned regressors.

  • z_resid: vector of demeaned (transformed) dependent variable.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: 
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
test <- penhdfeppml_cluster(data = trade[, -(5:6)],
                              dep = "export",
                              fixed = list(c("exp", "time"),
                                           c("imp", "time"),
                                           c("exp", "imp")),
                              cluster = c("exp", "imp"),
                              selectobs = (trade$imp %in% americas) & (trade$exp %in% americas),
                              tol = 1e-5, hdfetol = 1e-1)

## End(Not run)


penppml documentation built on Sept. 8, 2023, 5:58 p.m.