mlfitppml_int: General Penalized PPML Estimation

View source: R/mlfitppml_int.R

mlfitppml_intR Documentation

General Penalized PPML Estimation

Description

mlfitppml_int is the internal wrapper called by mlfitppml for penalized PPML estimation. This in turn calls penhdfeppml_int, penhdfeppml_cluster_int and hdfeppml_int as needed. It takes a vector with the dependent variable, a regressor matrix and a set of fixed effects (in list form: each element in the list should be a separate HDFE). This is a flexible tool that allows users to select:

  • Penalty type: either lasso or ridge.

  • Penalty parameter: users can provide a single global value for lambda (a single regression is estimated), a vector of lambda values (the function estimates the regression using each of them, sequentially) or even coefficient-specific penalty weights.

  • Method: plugin lasso estimates can be obtained directly from this function too.

  • Cross-validation: if this option is enabled, the function uses IDs provided by the user to perform k-fold cross-validation and reports the resulting RMSE for all lambda values.

Usage

mlfitppml_int(
  y,
  x,
  fes,
  lambdas,
  penalty = "lasso",
  tol = 1e-08,
  hdfetol = 1e-04,
  colcheck = TRUE,
  colcheck_x = colcheck,
  colcheck_x_fes = colcheck,
  post = TRUE,
  cluster = NULL,
  method = "bic",
  IDs = 1:n,
  verbose = FALSE,
  xval = FALSE,
  standardize = TRUE,
  vcv = TRUE,
  phipost = TRUE,
  penweights = NULL,
  K = 15,
  gamma_val = NULL,
  mu = NULL
)

Arguments

y

Dependent variable (a vector)

x

Regressor matrix.

fes

List of fixed effects.

lambdas

Vector of penalty parameters.

penalty

A string indicating the penalty type. Currently supported: "lasso" and "ridge".

tol

Tolerance parameter for convergence of the IRLS algorithm.

hdfetol

Tolerance parameter for the within-transformation step, passed on to collapse::fhdwithin.

colcheck

Logical. If TRUE, performs both checks in colcheck_x and colcheck_x_fes. If the user specifies colcheck_x and colcheck_x_fes individually, this option is overwritten.

colcheck_x

Logical. If TRUE, this checks collinearity between the independent variables and drops the collinear variables.

colcheck_x_fes

Logical. If TRUE, this checks whether the independent variables are perfectly explained by the fixed effects drops those that are perfectly explained.

post

Logical. If TRUE, estimates a post-penalty regression with the selected variables.

cluster

Optional: a vector classifying observations into clusters (to use when calculating SEs).

method

The user can set this equal to "plugin" to perform the plugin algorithm with coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used.

IDs

A vector of fold IDs for k-fold cross validation. If left unspecified, each observation is assigned to a different fold (warning: this is likely to be very resource-intensive).

verbose

Logical. If TRUE, it prints information to the screen while evaluating.

xval

Logical. If TRUE, it carries out cross-validation.

standardize

Logical. If TRUE, x variables are standardized before estimation.

vcv

Logical. If TRUE (the default), the post-estimation model includes standard errors.

phipost

Logical. If TRUE, the plugin coefficient-specific penalty weights are iteratively calculated using estimates from a post-penalty regression when method == "plugin". Otherwise, these are calculated using estimates from a penalty regression.

penweights

Optional: a vector of coefficient-specific penalties to use in plugin lasso when method == "plugin".

K

Maximum number of iterations for the plugin algorithm to converge.

gamma_val

Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n).

mu

A vector of initial values for mu that can be passed to the command.

Details

For technical details on the algorithms used, see hdfeppml_int (post-lasso regression), penhdfeppml_int (standard penalized regression), penhdfeppml_cluster_int (plugin lasso), and xvalidate (cross-validation).

Value

A list with the following elements:

  • beta: if post = FALSE, a length(lambdas) x ncol(x) matrix with coefficient (beta) estimates from the penalized regressions. If post = TRUE, this is the matrix of coefficients from the post-penalty regressions.

  • beta_pre: if post = TRUE, a length(lambdas) x ncol(x) matrix with coefficient (beta) estimates from the penalized regressions.

  • bic: Bayesian Information Criterion.

  • lambdas: vector of penalty parameters.

  • ses: standard errors of the coefficients of the post-penalty regression. Note that these are only provided when post = TRUE.

  • rmse: if xval = TRUE, a matrix with the root mean squared error (RMSE - column 2) for each value of lambda (column 1), obtained by cross-validation.

  • phi: coefficient-specific penalty weights (only if method == "plugin").

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: 
# First, we need to transform the data (this is what mlfitppml handles internally). Start by
# filtering the data set to keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
trade <- trade[(trade$imp %in% americas) & (trade$exp %in% americas), ]
# Now generate the needed x, y and fes objects:
y <- trade$export
x <- data.matrix(trade[, -1:-6])
fes <- list(exp_time = interaction(trade$exp, trade$time),
            imp_time = interaction(trade$imp, trade$time),
            pair     = interaction(trade$exp, trade$imp))
# Finally, we try mlfitppml_int with a lasso penalty (the default) and two lambda values:
reg <- mlfitppml_int(y = y, x = x, fes = fes, lambdas = c(0.1, 0.01))

# We can also try plugin lasso:
\donttest{reg <- mlfitppml_int(y = y, x = x, fes = fes, cluster = fes$pair, method = "plugin")}

# For an example with cross-validation, please see the vignette.

## End(Not run)


penppml documentation built on Sept. 8, 2023, 5:58 p.m.