penhdfeppml_cluster_int: Plugin Lasso Estimation

View source: R/penhdfeppml_cluster_int.R

penhdfeppml_cluster_intR Documentation

Plugin Lasso Estimation

Description

Performs plugin lasso - PPML estimation with HDFE. This is an internal function, called by mlfitppml_int and penhdfeppml_int when users select the method = "plugin" option, but it's made available as a stand-alone option for advanced users who may prefer to avoid some overhead imposed by the wrappers.

Usage

penhdfeppml_cluster_int(
  y,
  x,
  fes,
  cluster,
  tol = 1e-08,
  hdfetol = 1e-04,
  glmnettol = 1e-12,
  penalty = "lasso",
  penweights = NULL,
  saveX = TRUE,
  mu = NULL,
  colcheck = TRUE,
  colcheck_x = colcheck,
  colcheck_x_fes = colcheck,
  K = 15,
  init_z = NULL,
  post = FALSE,
  verbose = FALSE,
  lambda = NULL,
  phipost = TRUE,
  gamma_val = NULL
)

Arguments

y

Dependent variable (a vector)

x

Regressor matrix.

fes

List of fixed effects.

cluster

Optional: a vector classifying observations into clusters (to use when calculating SEs).

tol

Tolerance parameter for convergence of the IRLS algorithm.

hdfetol

Tolerance parameter for the within-transformation step, passed on to collapse::fhdwithin.

glmnettol

Tolerance parameter to be passed on to glmnet.

penalty

Only "lasso" is supported at the present stage.

penweights

Optional: a vector of coefficient-specific penalties to use in plugin lasso when method == "plugin".

saveX

Logical. If TRUE, it returns the values of x and z after partialling out the fixed effects.

mu

A vector of initial values for mu that can be passed to the command.

colcheck

Logical. If TRUE, performs both checks in colcheck_x and colcheck_x_fes. If the user specifies colcheck_x and colcheck_x_fes individually, this option is overwritten.

colcheck_x

Logical. If TRUE, this checks collinearity between the independent variables and drops the collinear variables.

colcheck_x_fes

Logical. If TRUE, this checks whether the independent variables are perfectly explained by the fixed effects drops those that are perfectly explained.

K

Maximum number of iterations.

init_z

Optional: initial values of the transformed dependent variable, to be used in the first iteration of the algorithm.

post

Logical. If TRUE, estimates a post-penalty regression with the selected variables.

verbose

Logical. If TRUE, it prints information to the screen while evaluating.

lambda

Penalty parameter (a number).

phipost

Logical. If TRUE, the plugin coefficient-specific penalty weights are iteratively calculated using estimates from a post-penalty regression. Otherwise, these are calculated using estimates from a penalty regression.

gamma_val

Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n).

Details

The plugin method uses coefficient-specific penalty weights that account for heteroskedasticity. The penalty parameters are calculated automatically by the function using statistical theory - for a brief discussion of this, see Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2021), and for a more in-depth analysis, check Belloni, Chernozhukov, Hansen, and Kozbur (2016), which introduced the specific implementation used in this package. Heuristically, the penalty parameters are set at a level high enough so that the absolute value of the score for each regressor must be statistically large relative to its standard error in order for the regressors to be selected.

Value

An object of class elnet with the elements described in glmnet, as well as the following:

  • mu: a 1 x length(y) matrix with the final values of the conditional mean \mu.

  • deviance.

  • bic: Bayesian Information Criterion.

  • phi: coefficient-specific penalty weights.

  • x_resid: matrix of demeaned regressors.

  • z_resid: vector of demeaned (transformed) dependent variable.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: 
# To reduce run time, we keep only countries in Latin America and the Caribbean:
LatAmericaCar <- countries$iso[countries$sub.region == "Latin America and the Caribbean"]
trade <- trade[(trade$imp %in% LatAmericaCar) & (trade$exp %in% LatAmericaCar), ]
# Now generate the needed x, y and fes objects:
y <- trade$export
x <- data.matrix(trade[, -1:-6])
fes <- list(exp_time = interaction(trade$exp, trade$time),
            imp_time = interaction(trade$imp, trade$time),
            pair     = interaction(trade$exp, trade$imp))
# Finally, we try penhdfeppml_cluster_int:
reg <- penhdfeppml_cluster_int(y = y, x = x, fes = fes, cluster = fes$pair)

## End(Not run)


penppml documentation built on Sept. 8, 2023, 5:58 p.m.