cluster.im.glm: Cluster-Adjusted Confidence Intervals And p-Values For GLM

Description Usage Arguments Value Note Author(s) References Examples

View source: R/clusterIM.glm.R

Description

Computes p-values and confidence intervals for GLM models based on cluster-specific model estimation (Ibragimov and Muller 2010). A separate model is estimated in each cluster, and then p-values and confidence intervals are computed based on a t/normal distribution of the cluster-specific estimates.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cluster.im.glm(
  mod,
  dat,
  cluster,
  ci.level = 0.95,
  report = TRUE,
  drop = FALSE,
  truncate = FALSE,
  return.vcv = FALSE
)

Arguments

mod

A model estimated using glm.

dat

The data set used to estimate mod.

cluster

A formula of the clustering variable.

ci.level

What confidence level should CIs reflect?

report

Should a table of results be printed to the console?

drop

Should clusters within which a model cannot be estimated be dropped?

truncate

Should outlying cluster-specific beta estimates be excluded?

return.vcv

Should a VCV matrix and the means of cluster-specific coefficient estimates be returned?

Value

A list with the elements

p.values

A matrix of the estimated p-values.

ci

A matrix of confidence intervals.

vcv.hat

Optional: A cluster-level variance-covariance matrix for coefficient estimates.

beta.bar

Optional: A vector of means for cluster-specific coefficient estimates.

Note

Confidence intervals are centered on the cluster averaged estimate, which can diverge from original model estimates under several circumstances (e.g., if clusters have different numbers of observations). Consequently, confidence intervals may not be centered on original model estimates. If drop = TRUE, any cluster for which all coefficients cannot be estimated will be automatically dropped from the analysis. If truncate = TRUE, any cluster for which any coefficient is more than 6 times the interquartile range from the cross-cluster mean will also be dropped as an outlier.

Author(s)

Justin Esarey

References

Esarey, Justin, and Andrew Menger. 2017. "Practical and Effective Approaches to Dealing with Clustered Data." Political Science Research and Methods forthcoming: 1-35. <URL:http://jee3.web.rice.edu/cluster-paper.pdf>.

Ibragimov, Rustam, and Ulrich K. Muller. 2010. "t-Statistic Based Correlation and Heterogeneity Robust Inference." Journal of Business & Economic Statistics 28(4): 453-468. <DOI:10.1198/jbes.2009.08046>.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## Not run: 

#####################################################################
# example one: predict whether respondent has a university degree
#####################################################################

require(effects)
data(WVS)
logit.model <- glm(degree ~ religion + gender + age, data=WVS, family=binomial(link="logit"))
summary(logit.model)

# compute cluster-adjusted p-values
clust.im.p <- cluster.im.glm(logit.model, WVS, ~ country, report = T)

############################################################################
# example two: linear model of whether respondent has a university degree
#              with interaction between gender and age + country FEs
############################################################################

WVS$degree.n <- as.numeric(WVS$degree)
WVS$gender.n <- as.numeric(WVS$gender)
WVS$genderXage <- WVS$gender.n * WVS$age
lin.model <- glm(degree.n ~ gender.n + age + genderXage + religion + as.factor(country), data=WVS)

# compute marginal effect of male gender on probability of obtaining a university degree
# using conventional standard errors
age.vec <- seq(from=18, to=90, by=1)
me.age <- coefficients(lin.model)[2] + coefficients(lin.model)[4]*age.vec
plot(me.age ~ age.vec, type="l", ylim=c(-0.1, 0.1), xlab="age", 
     ylab="ME of male gender on Pr(university degree)")
se.age <- sqrt( vcov(lin.model)[2,2] + vcov(lin.model)[4,4]*(age.vec)^2 + 
                2*vcov(lin.model)[2,4]*age.vec)
ci.h <- me.age + qt(0.975, lower.tail=T, df=lin.model$df.residual) * se.age
ci.l <- me.age - qt(0.975, lower.tail=T, df=lin.model$df.residual) * se.age
lines(ci.h ~ age.vec, lty=2)
lines(ci.l ~ age.vec, lty=2)


# cluster on country, compute CIs for marginal effect of gender on degree attainment
# drop the FEs (absorbed into cluster-level coefficients)
lin.model.n <- glm(degree.n ~ gender.n + age + genderXage + religion, data=WVS)
clust.im.result <- cluster.im.glm(lin.model.n, WVS, ~ country, report = T, return.vcv = T)
# compute ME using average of cluster-level estimates (CIs center on this)
me.age.im <- clust.im.result$beta.bar[2] + clust.im.result$beta.bar[4]*age.vec
se.age.im <- sqrt( clust.im.result$vcv[2,2] + clust.im.result$vcv[4,4]*(age.vec)^2 +
                   2*clust.im.result$vcv[2,4]*age.vec)
# center the CIs on the ME using average of cluster-level estimates
# important: divide by sqrt(G) to convert SE of cluster-level estimates 
#            into SE of the mean, where G = number of clusters
G <- length(unique(WVS$country))
ci.h.im <- me.age.im + qt(0.975, lower.tail=T, df=(G-1)) * se.age.im/sqrt(G)
ci.l.im <- me.age.im - qt(0.975, lower.tail=T, df=(G-1)) * se.age.im/sqrt(G)
plot(me.age.im ~ age.vec, type="l", ylim=c(-0.2, 0.2), xlab="age", 
     ylab="ME of male gender on Pr(university degree)")
lines(ci.h.im ~ age.vec, lty=2)
lines(ci.l.im ~ age.vec, lty=2)
# for comparison, here's the ME estimate and CIs from the baseline model
lines(me.age ~ age.vec, lty=1, col="gray")
lines(ci.h ~ age.vec, lty=3, col="gray")
lines(ci.l ~ age.vec, lty=3, col="gray")


## End(Not run)

Example output

Loading required package: AER
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: survival
Loading required package: Formula
Loading required package: plm

 When using this package, cite: 
 
 Justin Esarey and Andrew Menger (2017). 
 "Practical and Effective Approaches to Dealing with Clustered Data." 
 Political Science Research and Methods, FirstView, 1-35. 
 URL: https://doi.org/10.1017/psrm.2017.42. 

Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.

Call:
glm(formula = degree ~ religion + gender + age, family = binomial(link = "logit"), 
    data = WVS)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8211  -0.7283  -0.6599  -0.5676   2.0293  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.788616   0.125616  -6.278 3.43e-10 ***
religionyes  0.112407   0.096764   1.162    0.245    
gendermale  -0.076761   0.067243  -1.142    0.254    
age         -0.013215   0.002031  -6.507 7.67e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5565.4  on 5380  degrees of freedom
Residual deviance: 5519.4  on 5377  degrees of freedom
AIC: 5527.4

Number of Fisher Scoring iterations: 4


 Cluster-Adjusted p-values:  
 
           variable name   cluster-adjusted p-value
             (Intercept)                      0.137
             religionyes                      0.757
              gendermale                      0.483
                     age                      0.054

 Confidence Intervals (centered on cluster-averaged results): 
 
       variable name               CI lower              CI higher
         (Intercept)      -1.84567477037682      0.414084696182316
         religionyes     -0.952467256737726       1.17979244250419
          gendermale     -0.532360229693009      0.318837207458131
                 age    -0.0325665660185535   0.000525559143591033

 Cluster-Adjusted p-values:  
 
           variable name   cluster-adjusted p-value
             (Intercept)                      0.001
                gender.n                      0.085
                     age                       0.02
              genderXage                      0.054
             religionyes                       0.61

 Confidence Intervals (centered on cluster-averaged results): 
 
sh: 1: rm: Permission denied
        variable name                CI lower               CI higher
          (Intercept)        1.09056442712984        1.80358840050465
             gender.n      -0.221391632195028      0.0250407499197962
                  age    -0.00763055007363976    -0.00135529383387161
           genderXage   -4.93657083501635e-05     0.00319110999028355
          religionyes      -0.131218073728837       0.188189507431759

clusterSEs documentation built on April 6, 2021, 1:06 a.m.