iceberg: Iceberg Lasso Implementation (in development)

View source: R/iceberg.R

icebergR Documentation

Iceberg Lasso Implementation (in development)

Description

A function performs standard plugin lasso PPML estimation (without fixed effects) for several dependent variables in a single step. This is still IN DEVELOPMENT: at the current stage, only coefficient estimates are are provided and there is no support for clustered errors.

Usage

iceberg(data, dep, indep = NULL, selectobs = NULL, ...)

Arguments

data

A data frame containing all relevant variables.

dep

A string with the names of the independent variables or their column numbers.

indep

A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.

selectobs

Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).

...

Further arguments, including:

  • tol: Tolerance parameter for convergence of the IRLS algorithm.

  • glmnettol: Tolerance parameter to be passed on to glmnet::glmnet.

  • penweights: Optional: a vector of coefficient-specific penalties to use in plugin lasso.

  • colcheck: Logical. If TRUE, checks for perfect multicollinearity in x.

  • K: Maximum number of iterations.

  • verbose: Logical. If TRUE, prints information to the screen while evaluating.

  • lambda: Penalty parameter (a number).

  • icepost: Logical. If TRUE, it carries out a post-lasso estimation with just the selected variables and reports the coefficients from this regression.

Details

This functions enables users to implement the "iceberg" step in the two-step procedure described in Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2020). To do this after using the plugin method in mlfitppml, just select all the variables with non-zero coefficients in dep and the remaining regressors in indep. The function will then perform separate lasso estimation on each of the selected dependent variables and report the coefficients.

Value

A matrix with coefficient estimates for all dependent variables.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

iceberg_results <- iceberg(data = trade[, -(1:6)],
                                    dep = c("ad_prov_14", "cp_prov_23", "tbt_prov_07",
                                            "tbt_prov_33", "tf_prov_41", "tf_prov_45"),
                                    selectobs = (trade$time == "2016"))


penppml documentation built on Sept. 8, 2023, 5:58 p.m.