Description Usage Arguments Value Note Author(s) References Examples
View source: R/clusterIM.glm.R
Computes p-values and confidence intervals for GLM models based on cluster-specific model estimation (Ibragimov and Muller 2010). A separate model is estimated in each cluster, and then p-values and confidence intervals are computed based on a t/normal distribution of the cluster-specific estimates.
1 2 3 4 5 6 7 8 9 10 |
mod |
A model estimated using |
dat |
The data set used to estimate |
cluster |
A formula of the clustering variable. |
ci.level |
What confidence level should CIs reflect? |
report |
Should a table of results be printed to the console? |
drop |
Should clusters within which a model cannot be estimated be dropped? |
truncate |
Should outlying cluster-specific beta estimates be excluded? |
return.vcv |
Should a VCV matrix and the means of cluster-specific coefficient estimates be returned? |
A list with the elements
p.values |
A matrix of the estimated p-values. |
ci |
A matrix of confidence intervals. |
vcv.hat |
Optional: A cluster-level variance-covariance matrix for coefficient estimates. |
beta.bar |
Optional: A vector of means for cluster-specific coefficient estimates. |
Confidence intervals are centered on the cluster averaged estimate, which can diverge from original model estimates under several circumstances (e.g., if clusters have different numbers of observations). Consequently, confidence intervals may not be centered on original model estimates. If drop = TRUE, any cluster for which all coefficients cannot be estimated will be automatically dropped from the analysis. If truncate = TRUE, any cluster for which any coefficient is more than 6 times the interquartile range from the cross-cluster mean will also be dropped as an outlier.
Justin Esarey
Esarey, Justin, and Andrew Menger. 2017. "Practical and Effective Approaches to Dealing with Clustered Data." Political Science Research and Methods forthcoming: 1-35. <URL:http://jee3.web.rice.edu/cluster-paper.pdf>.
Ibragimov, Rustam, and Ulrich K. Muller. 2010. "t-Statistic Based Correlation and Heterogeneity Robust Inference." Journal of Business & Economic Statistics 28(4): 453-468. <DOI:10.1198/jbes.2009.08046>.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ## Not run:
#####################################################################
# example one: predict whether respondent has a university degree
#####################################################################
require(effects)
data(WVS)
logit.model <- glm(degree ~ religion + gender + age, data=WVS, family=binomial(link="logit"))
summary(logit.model)
# compute cluster-adjusted p-values
clust.im.p <- cluster.im.glm(logit.model, WVS, ~ country, report = T)
############################################################################
# example two: linear model of whether respondent has a university degree
# with interaction between gender and age + country FEs
############################################################################
WVS$degree.n <- as.numeric(WVS$degree)
WVS$gender.n <- as.numeric(WVS$gender)
WVS$genderXage <- WVS$gender.n * WVS$age
lin.model <- glm(degree.n ~ gender.n + age + genderXage + religion + as.factor(country), data=WVS)
# compute marginal effect of male gender on probability of obtaining a university degree
# using conventional standard errors
age.vec <- seq(from=18, to=90, by=1)
me.age <- coefficients(lin.model)[2] + coefficients(lin.model)[4]*age.vec
plot(me.age ~ age.vec, type="l", ylim=c(-0.1, 0.1), xlab="age",
ylab="ME of male gender on Pr(university degree)")
se.age <- sqrt( vcov(lin.model)[2,2] + vcov(lin.model)[4,4]*(age.vec)^2 +
2*vcov(lin.model)[2,4]*age.vec)
ci.h <- me.age + qt(0.975, lower.tail=T, df=lin.model$df.residual) * se.age
ci.l <- me.age - qt(0.975, lower.tail=T, df=lin.model$df.residual) * se.age
lines(ci.h ~ age.vec, lty=2)
lines(ci.l ~ age.vec, lty=2)
# cluster on country, compute CIs for marginal effect of gender on degree attainment
# drop the FEs (absorbed into cluster-level coefficients)
lin.model.n <- glm(degree.n ~ gender.n + age + genderXage + religion, data=WVS)
clust.im.result <- cluster.im.glm(lin.model.n, WVS, ~ country, report = T, return.vcv = T)
# compute ME using average of cluster-level estimates (CIs center on this)
me.age.im <- clust.im.result$beta.bar[2] + clust.im.result$beta.bar[4]*age.vec
se.age.im <- sqrt( clust.im.result$vcv[2,2] + clust.im.result$vcv[4,4]*(age.vec)^2 +
2*clust.im.result$vcv[2,4]*age.vec)
# center the CIs on the ME using average of cluster-level estimates
# important: divide by sqrt(G) to convert SE of cluster-level estimates
# into SE of the mean, where G = number of clusters
G <- length(unique(WVS$country))
ci.h.im <- me.age.im + qt(0.975, lower.tail=T, df=(G-1)) * se.age.im/sqrt(G)
ci.l.im <- me.age.im - qt(0.975, lower.tail=T, df=(G-1)) * se.age.im/sqrt(G)
plot(me.age.im ~ age.vec, type="l", ylim=c(-0.2, 0.2), xlab="age",
ylab="ME of male gender on Pr(university degree)")
lines(ci.h.im ~ age.vec, lty=2)
lines(ci.l.im ~ age.vec, lty=2)
# for comparison, here's the ME estimate and CIs from the baseline model
lines(me.age ~ age.vec, lty=1, col="gray")
lines(ci.h ~ age.vec, lty=3, col="gray")
lines(ci.l ~ age.vec, lty=3, col="gray")
## End(Not run)
|
Loading required package: AER
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival
Loading required package: Formula
Loading required package: plm
When using this package, cite:
Justin Esarey and Andrew Menger (2017).
"Practical and Effective Approaches to Dealing with Clustered Data."
Political Science Research and Methods, FirstView, 1-35.
URL: https://doi.org/10.1017/psrm.2017.42.
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Call:
glm(formula = degree ~ religion + gender + age, family = binomial(link = "logit"),
data = WVS)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8211 -0.7283 -0.6599 -0.5676 2.0293
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.788616 0.125616 -6.278 3.43e-10 ***
religionyes 0.112407 0.096764 1.162 0.245
gendermale -0.076761 0.067243 -1.142 0.254
age -0.013215 0.002031 -6.507 7.67e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5565.4 on 5380 degrees of freedom
Residual deviance: 5519.4 on 5377 degrees of freedom
AIC: 5527.4
Number of Fisher Scoring iterations: 4
Cluster-Adjusted p-values:
variable name cluster-adjusted p-value
(Intercept) 0.137
religionyes 0.757
gendermale 0.483
age 0.054
Confidence Intervals (centered on cluster-averaged results):
variable name CI lower CI higher
(Intercept) -1.84567477037682 0.414084696182316
religionyes -0.952467256737726 1.17979244250419
gendermale -0.532360229693009 0.318837207458131
age -0.0325665660185535 0.000525559143591033
Cluster-Adjusted p-values:
variable name cluster-adjusted p-value
(Intercept) 0.001
gender.n 0.085
age 0.02
genderXage 0.054
religionyes 0.61
Confidence Intervals (centered on cluster-averaged results):
sh: 1: rm: Permission denied
variable name CI lower CI higher
(Intercept) 1.09056442712984 1.80358840050465
gender.n -0.221391632195028 0.0250407499197962
age -0.00763055007363976 -0.00135529383387161
genderXage -4.93657083501635e-05 0.00319110999028355
religionyes -0.131218073728837 0.188189507431759
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.