Adaptive FAB $p$-values are constructed for evaluating the mean test scores of 10th graders in a population of high schools. Information is shared across schools using a Fay-Herriot random effects model. This document serves as the replication code for the example in Section 3.2 of the article "Smaller $p$-values via indirect information" (Hoff 2019).
The data for this example are from the 2002 Educational Longitudinal Survey, data from which can be obained at https://nces.ed.gov/surveys/els2002/. A subset of the data needed to replicate the example is available on my website and can be loaded using the following command:
load(url("http://www2.stat.duke.edu/~pdh10/Datasets/els.RData"))
Now take a look at all the variables:
els[1:10,]
Separate out student-level characteristics from school-level characteristics:
y<-els$rscore group<-els$school groups<-sort(unique(group)) W<-as.matrix(els[,c("flp","urban","rural","private","catholic","rm","rs","rw","enrollment") ] ) X<-apply(W,2,function(x){ tapply(x,group,"mean") } )
Here, y
is the student-level reading score, and group
is the school id for each student. That is,
group[i]
is the school of student with score y[i]
.
The matrix X
is a matrix of school-level variables, and the
matrix W
is the same data except with the variable for each school replicated for each student in the school.
Let $\theta_j$ be the "true" mean test score in school $j$. The standard two-sided $p$-value for testing $H_j:\theta_j=50$ is based on the $t$-statistic $t_j = \sqrt{n_j}( \bar y_j - 50)/\hat\sigma_j$.
theta0<-50 YBAR<-tapply(y,group,mean) - theta0 S2<-tapply(y,group,var) N<-c(table(group)) T<-YBAR/sqrt(S2/N) pU<-c(2*pt(-abs(T),N-1))
To compute the FAB $p$-values, we need to first obtain
indirect information about each $\theta_j$ using data from schools other than
school $j$.
We quantify this indirect information using the
following linking model for school-specific means:
[
\theta_j = \beta^\top x_j + \tau \gamma_j
]
where $\beta$ and $\tau$ are unknown,
$x_j$ is a vector of observed characteristics of school $j$,
and $\gamma_1,\ldots, \gamma_p \sim$ are i.i.d. standard normal
random variables. For each school $j$, we get an estimate of these
parameters using the data from schools other than $j$, along with
an estimate of the within school sampling variance $\sigma^2$. This is
done using the lmer
command in the lme4
package:
library(lme4) BETA<-TAU<-SIGMA<-NULL for(i in 1:length(groups)){ g<-groups[i] iy<-y[group!=g] iW<-W[group!=g,] igroup<-group[group!=g] ifit<-lmer(iy ~ iW + (1|igroup) ) beta<-fixef(ifit) tau<-sqrt(unlist(VarCorr(ifit))) sigma<-attr(VarCorr(ifit), "sc") BETA<-rbind(BETA,beta) TAU<-c(TAU,tau) SIGMA<-c(SIGMA,sigma) #cat(g,"\n") }
Note that the $j$th elements of BETA
, TAU
and SIGMA
are estimated
WITHOUT data from school $j$, and so are independent of the data in school
$j$.
Based on these linking model parameter estimates we compute an indirect/prior mean $\theta_j$: Given school-level covariates $x_j$, the fitted predicted value for $\theta_j$ based on the linking model is $\hat\beta^\top x_j$:
ETHETA<-apply( BETA*cbind(1,X) ,1,sum) - theta0
Here we have centered things around the hypothesized mean value of $\theta_0=50$.
The FAB $p$-value for school $j$ is $1-|F( T_j + \tilde b_j ) - F(-T_j)|$, where $T_j$ is the $t$ statistic for school $j$, $b_j = 2 \tilde \mu (\tilde \sigma/\sqrt{n_j})/\tilde\tau^2$, and $F$ is the CDF of the appropriate $T$-distribution.
pF<- c(1-abs( pt(T+2*ETHETA*(SIGMA/sqrt(N))/(TAU^2),N-1) - pt(-T,N-1) ) )
Save some results:
resultsFHmodel<-list(T=T,YBAR=YBAR,S2=S2,N=N,pU=pU,ETHETA=ETHETA,TAU=TAU,SIGMA=SIGMA,pF=pF) save(resultsFHmodel,file="resultsFHmodel.RData")
sum(pF<=.05) sum(pU<=.05) mean(pF<=pU)
par(mfrow=c(1,3),mar=c(3,3,1,1),mgp=c(1.75,.75,0)) plot(resultsFHmodel$ETHETA,resultsFHmodel$YBAR, xlab="indirect estimate of school-level mean", ylab="direct estimate of school-level mean") abline(0,1,col="gray") pU<-resultsFHmodel$pU pF<-resultsFHmodel$pF plot(pU,pF,xlab="UMPU p-value",ylab="FAB p-value") abline(0,1,col="gray") plot(sort(pU),type="l",xlim=c(0,350),ylim=c(0,.06),lwd=2,col="pink", xlab="rank",ylab="p-value") lines(sort(pF),col="lightblue",lwd=2) abline(h=.05,col="gray") legend(5,.045,lwd=c(2,2),col=c("pink","lightblue"),legend=c("UMPU","FAB"), bty="n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.