README.md

glmhdfe – Package for R

The R package glmhdfe allows for the estimation of generalized linear models with high dimensional fixed effects. The package makes use of a convenient property of some combinations of error term distributions and link functions, where the fixed effects have — conditional on all other estimated parameters — an explicit solution.

Consider the following equation that we want to estimate:

glm

The first order conditions for fixed effects can be simplified to

glm

For certain distribution and link combinations this yields explicit solutions for the estimated coefficient for the fixed effects glm, given estimates for beta and the other deltas. Specifically, this is the case for the Gaussian distribution with identity and log link, and for the Poisson, Gamma and Inverse Gaussian distributions with log link. This makes it possible to update the fixed effects separately from the estimation of the coefficients on variables of interest in every iteration of the IRLS procedure used to estimate beta, dramatically increasing the speed of the estimation procedure.

For more detail on the inner workings see the technical note. A Stata implementation is coming soon.

Implementation in R

The R package glmhdfe implements this "trick" and utilizes the powers of the data.table package for a fast implementation. For smaller datasets or other error term distributions we recommend the feglm command in Amrei Stammann's alpaca package that also allows high-dimensional fixed effects in GLM estimations.

Installation

Install from Github via the remotes package:

remotes::install_github("julianhinz/R_glmhdfe")

Examples

The glmhdfe function has a similar syntax as the felm function from the lfe package and the feglm function in the alpaca package:

glmhdfe(trade ~ fta | iso_o_year + iso_d_year + iso_o_iso_d | iso_o + iso_d + year,
        family = poisson(link = "log"),
        data = data)

The first part of the formula is specified as usual. The second part of the formula specifies the fixed effects dimensions, the third part, which is optional, the clustering of the standard errors.

Options

There are numerous options to tweak the estimation procedure:

Other functions

There is the usual battery of generic functions, like coef, summary, etc. Furthermore, if for some reason you want to (re-)estimate the variance-covariance matrix afterwards, or change the level of clustering, you can do so with the compute_vcov command:

compute_vcov(data, call, info)

You need to specify the data (best in the form of a glmhdfe_data object), call (for information on clustering and variable of interest), and info (for information on degrees of freedom, etc.).

Roadmap

Bugs?



julianhinz/R_glmhdfe documentation built on Feb. 11, 2022, 7:37 a.m.