knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette is adapted from https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/nonexistence_benchmarks.md#r-packages.
library(capybara)
The table below shows a dataset with 12 observations and four regressors. Observations 5 is separated because $y=0$ and $z \neq x_2 - x_1$ is positive in those observations, and zero otherwise.
correia2019$example1 which(correia2019$example1$x2 - correia2019$example1$x1 != 0 & correia2019$example1$y == 0)
# A tibble: 12 × 5
y x1 x2 x3 x4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 2 10
2 0 0 0 0 -2
3 0 0 0 0 6
4 0 0 0 4 5
5 0 1 0 0 3
6 2 0 0 0 3
7 2 0 0 0 4
8 2 0 0 -2 15
9 2 0 0 0 -7
10 4 0 0 -3 15
11 6 -3 -3 0 4
12 6 0 0 0 4
[1] 5
Base R shall not give a warning when estimating a Poisson model on this data.
glm( y ~ x1 + x2 + x3 + x4, data = correia2019$example1, family = poisson() )
Call: glm(formula = y ~ x1 + x2 + x3 + x4, family = poisson(), data = correia2019$example1) Coefficients: (Intercept) x1 x2 x3 x4 0.59095 -17.78017 17.32952 -0.47085 -0.03779 Degrees of Freedom: 11 Total (i.e. Null); 7 Residual Null Deviance: 31.91 Residual Deviance: 15.96 AIC: 46.99
Capybara will detect separation on this dataset.
fepoisson( y ~ x1 + x2 + x3 + x4, data = correia2019$example1 )
Separation found in 1 observation(s) Formula: y ~ x1 + x2 + x3 + x4 Family: Poisson Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |-------------|----------|------------|---------|-----------| | (Intercept) | 0.5910 | 0.3029 | 1.9510 | 0.0511 + | | x1 | -0.4507 | 0.1648 | -2.7352 | 0.0062 ** | | x2 | NA | NA | NA | NA | | x3 | -0.4708 | 0.2311 | -2.0367 | 0.0417 * | | x4 | -0.0378 | 0.0438 | -0.8634 | 0.3879 | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 Pseudo R-squared: 0.4813 Number of observations: Full 12; Separated 1; Perfect classification 0 Number of Fisher Scoring iterations: 5
The table below shows a different dataset with 12 observations and four regressors. Observations 2, 3, 6, 7 and 8 are separated because $y=0$ and $z > x_2 - x_1$ is positive in those observations, and zero otherwise.
correia2019$example2 which(correia2019$example2$x2 - correia2019$example2$x1 > 0 & correia2019$example2$y == 0)
# A tibble: 12 × 5 y x1 x2 x3 x4 <dbl> <dbl> <dbl> <dbl> <dbl> 1 0 14 4 0 -12 2 0 0 35 34 12 3 0 0 3 0 17 4 0 0 0 0 1 5 0 0 0 -2 7 6 0 0 25 0 12 7 0 0 13 0 60 8 0 0 15 0 7 9 0 1 0 7 -24 10 9 0 0 0 18 11 4 2 0 6 -1 12 2 1 0 0 -7 [1] 2 3 6 7 8
Base R shall give a convergence warning when estimating a Poisson model on this data.
glm( y ~ x1 + x2 + x3 + x4, data = correia2019$example2, family = poisson() )
Call: glm(formula = y ~ x1 + x2 + x3 + x4, family = poisson(), data = correia2019$example2) Coefficients: (Intercept) x1 x2 x3 x4 -367.83 512.42 -1644.86 -105.85 20.56 Degrees of Freedom: 11 Total (i.e. Null); 7 Residual Null Deviance: 46.72 Residual Deviance: 9.672e-06 AIC: 19.93 Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: fitted rates numerically 0 occurred
Capybara will detect separation on this dataset.
fepoisson( y ~ x1 + x2 + x3 + x4, data = correia2019$example2 )
Separation found in 6 observation(s) Formula: y ~ x1 + x2 + x3 + x4 Family: Poisson Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |-------------|-----------|------------|---------|----------| | (Intercept) | -239.2049 | 1134.4711 | -0.2109 | 0.8330 | | x1 | 333.7767 | 1575.6630 | 0.2118 | 0.8322 | | x2 | NA | NA | NA | NA | | x3 | -68.9251 | 325.6385 | -0.2117 | 0.8324 | | x4 | 13.4112 | 63.0263 | 0.2128 | 0.8315 | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 Pseudo R-squared: 1.00 Number of observations: Full 12; Separated 6; Perfect classification 0 Number of Fisher Scoring iterations: 25
Base R shall not give a warning when estimating a Poisson model with fixed effects on the dataset below. The separation in this case is less clear, which motivates why capybara uses linear programming to detect separation in Poisson models.
correia2019$fe1
# A tibble: 18 × 5
y x1 x2 i j
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 0 0 2 1
2 0 0 0 4 2
3 0 0 0 1 1
4 1 1 0 4 3
5 0 0 1 2 2
6 0 0 0 2 2
7 1 0 0 5 4
8 0 1 2 2 3
9 1 0 0 1 1
10 0 0 0 2 2
11 0 2 0 1 3
12 0 0 0 1 3
13 0 1 0 2 2
14 0 0 1 5 2
15 0 0 1 2 4
16 0 0 0 1 2
17 0 0 0 1 1
18 2 0 0 1 2
Base R shall not give a warning when estimating a Poisson model with fixed effects on this data.
glm( y ~ x1 + x2 + factor(i) + factor(j), data = correia2019$fe1, family = poisson() )
Call: glm(formula = y ~ x1 + x2 + factor(i) + factor(j), family = poisson(), data = correia2019$fe1) Coefficients: (Intercept) x1 x2 factor(i)2 factor(i)4 factor(i)5 -0.4114 -0.4845 -18.5226 0.4233 0.7375 0.9101 factor(j)2 factor(j)3 factor(j)4 -0.9854 -0.5696 -0.4986 Degrees of Freedom: 17 Total (i.e. Null); 9 Residual Null Deviance: 18.77 Residual Deviance: 13.36 AIC: 42.59 Warning message: glm.fit: fitted rates numerically 0 occurred
Capybara will detect separation when estimating Poisson models.
fepoisson( y ~ x1 + x2 | i + j, data = correia2019$fe1 )
Separation found in 4 observation(s) Formula: y ~ x1 + x2 | i + j Family: Poisson Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |----|----------|------------|---------|----------| | x1 | -0.4811 | 1.2419 | -0.3874 | 0.6984 | | x2 | NA | NA | NA | NA | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 Pseudo R-squared: 0.1874 Fixed effects: i: 4 j: 4 Number of observations: Full 18; Separated 4; Perfect classification 0 Number of Fisher Scoring iterations: 4
'correia2019' (Stata) will also detect separation when estimating Poisson models with fixed effects.
. correia2019 y x1 x2, a(i j) (simplex method dropped 4 separated observations) (dropped 1 singleton observations) note: 1 variable omitted because of collinearity: x2 $$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 732 subiterations) Iteration 1: deviance = 1.4149e+01 eps = . iters = 4 tol = 1.0e-04 ... Iteration 6: deviance = 1.3364e+01 eps = 1.85e-16 iters = 3 tol = 1.0e-07 ------------------------------------------------------------------------------- (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon > below tolerance) Converged in 6 iterations and 21 HDFE sub-iterations (tol = 1.0e-08) HDFE PPML regression No. of obs = 13 Absorbing 2 HDFE groups Residual df = 7 Wald chi2(1) = 0.53 Deviance = 13.36443052 Prob > chi2 = 0.4686 Log pseudolikelihood = -11.2959209 Pseudo R2 = 0.0607 ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.4845469 .6685201 -0.72 0.469 -1.794822 .8257284 x2 | 0 (omitted) _cons | -.5708466 .4604099 -1.24 0.215 -1.473233 .3315402 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| i | 3 0 3 | j | 3 1 2 | -----------------------------------------------------+
A difference with respect to Stata's 'correia2019' is that capybara requires an explicit cluster term in the formula to compute robust standard errors using a sandwich operation.
correia2019$fe1$pair <- paste0(correia2019$fe1$i, correia2019$fe1$j) fepoisson( y ~ x1 + x2 | i + j | pair, data = correia2019$fe1 )
Separation found in 4 observation(s) Formula: y ~ x1 + x2 | i + j | pair Family: Poisson Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |----|----------|------------|---------|----------| | x1 | -0.4811 | 0.3101 | -1.5514 | 0.1208 | | x2 | NA | NA | NA | NA | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 Pseudo R-squared: 0.1874 Fixed effects: i: 4 j: 4 Number of observations: Full 18; Separated 4; Perfect classification 0 Number of Fisher Scoring iterations: 4
Other R packages may not detect separation in Poisson models with fixed effects. By disabling the separation check in Capybara, we can match 'alpaca' and 'fixest' results.
Capybara without checking separation (incorrect $\beta_2$):
fepoisson( y ~ x1 + x2 | i + j, data = correia2019$fe1, control = list(check_separation = FALSE) )
Formula: y ~ x1 + x2 | i + j Family: Poisson Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |----|----------|------------|---------|----------| | x1 | -0.4845 | 1.2439 | -0.3895 | 0.6969 | | x2 | -12.3711 | 383.1463 | -0.0323 | 0.9742 | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 Pseudo R-squared: 0.2614 Fixed effects: i: 4 j: 4 Number of observations: Full 18; Missing 0; Perfect classification 0 Number of Fisher Scoring iterations: 14
Alpaca:
alpaca::feglm( y ~ x1 + x2 | i + j, data = correia2019$fe1, family = poisson() )
poisson - log link, l= [4, 4] x1 x2 -0.4845 -17.3711
Fixest:
fixest::feglm( y ~ x1 + x2 | i + j, data = correia2019$fe1, family = poisson() )
GLM estimation, family = poisson, Dep. Var.: y Observations: 18 Fixed-effects: i: 4, j: 4 Standard-errors: IID Estimate Std. Error z value Pr(>|z|) x1 -0.484547 1.70957 -0.283432 0.77685 x2 -18.514805 6880.80328 -0.002691 0.99785 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Log-Likelihood: -12.3 Adj. Pseudo R2: -0.353285 BIC: 50.6 Squared Cor.: 0.261426
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.