penhdfeppml: One-Shot Penalized PPML Estimation with HDFE
In penppml: Penalized Poisson Pseudo Maximum Likelihood Regression

penhdfeppml

R Documentation

One-Shot Penalized PPML Estimation with HDFE

Description

penhdfeppml fits a penalized PPML regression for a given type of penalty and a given value of the penalty parameter. The penalty can be either lasso or ridge, and the plugin method can be enabled via the method argument.

Usage

penhdfeppml(
  data,
  dep = 1,
  indep = NULL,
  fixed = NULL,
  cluster = NULL,
  selectobs = NULL,
  ...
)

Arguments

`data`	A data frame containing all relevant variables.
`dep`	A string with the name of the independent variable or a column number.
`indep`	A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.
`fixed`	A vector with the names or column numbers of factor variables identifying the fixed effects, or a list with the desired interactions between variables in `data`.
`cluster`	Optional. A string with the name of the clustering variable or a column number. It's also possible to input a vector with several variables, in which case the interaction of all of them is taken as the clustering variable.
`selectobs`	Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).
`...`	Further options, including: `penalty`: A string indicating the penalty type. Currently supported: "lasso" and "ridge". `method`: The user can set this equal to "plugin" to perform the plugin algorithm with coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used. For a full list of options, see penhdfeppml_int.

Details

This function is a thin wrapper around penhdfeppml_int, providing a more convenient interface for data frames. Whereas the internal function requires some preliminary handling of data sets (y must be a vector, x must be a matrix and fes must be provided in a list), the wrapper takes a full data frame in the data argument, and users can simply specify which variables correspond to y, x and the fixed effects, using either variable names or column numbers.

More formally, penhdfeppml_int performs iteratively re-weighted least squares (IRLS) on a transformed model, as described in Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2021). In each iteration, the function calculates the transformed dependent variable, partials out the fixed effects (calling lfe::fhdwithin) and then and then calls glmnet::glmnet if the selected penalty is lasso (the default). If the user has selected ridge, the analytical solution is instead computed directly using fast C++ implementation.

For information on how the plugin lasso method works, see penhdfeppml_cluster.

Value

If method == "lasso" (the default), an object of class elnet with the elements described in glmnet, as well as:

mu: a 1 x length(y) matrix with the final values of the conditional mean \mu.
deviance.
bic: Bayesian Information Criterion.
phi: coefficient-specific penalty weights (only if method == "plugin".
x_resid: matrix of demeaned regressors.
z_resid: vector of demeaned (transformed) dependent variable.

If method == "ridge", a list with the following elements:

beta: a 1 x ncol(x) matrix with coefficient (beta) estimates.
mu: a 1 x length(y) matrix with the final values of the conditional mean \mu.
deviance.
bic: Bayesian Information Criterion.
x_resid: matrix of demeaned regressors.
z_resid: vector of demeaned (transformed) dependent variable.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: 
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
test <- penhdfeppml(data = trade[, -(5:6)],
                      dep = "export",
                      fixed = list(c("exp", "time"),
                                   c("imp", "time"),
                                   c("exp", "imp")),
                      lambda = 0.05,
                      selectobs = (trade$imp %in% americas) & (trade$exp %in% americas))

## End(Not run)

penppml documentation built on Sept. 8, 2023, 5:58 p.m.