schoolgrowth | R Documentation |
Uses student-level growth scores for particular years, grades, and subjects to compute direct estimates and empirical best linear predictors (EBLPs) of specified school-level aggregate growth targets. Estimated MSEs for both direct estimates and EBLPs are also provided.
schoolgrowth(d, target = NULL, target_contrast = NULL, control = list())
d |
A dataframe consisting of student-level growth scores. Each record
of |
target |
A named character vector of length at most 4 that is used to define the target estimand for each school. The function computes the direct estimate and EBLP of this target for each school (see Details), provided that the school has at least some data in the required blocks. The names of The valid values for the 'years' element of The valid values for the 'subjects' element of The valid values for the 'grades' element of The combination of the 'years', 'subjects' and 'grades' elements of
See the Examples section for examples for specifying |
target_contrast |
An optional named character vector, analogous to See the Examples section for an example for specifying
|
control |
An optional named list of control arguments. Named elements that are not among the following list will be ignored.
|
Details on the statistical model and estimation procedures are
provided in Lockwood, Castellano and McCaffrey (2020). Each record of
d
provides the growth score for a given student from a given
'block', where a block is defined as the combination of a given year,
a given grade-level, and a given subject. These data are used to
estimate the parameters of a linear mixed-effects model.
Let B be the total number of blocks represented in the data and S be the total number of schools represented in the data. Students are first partitioned into "patterns" defined by the set of blocks for which they have observed growth scores. The linear mixed-effects model then assumes that the growth score Y(i,b) for a given student i in a given block b depends on a pattern-specific, block-specific mean parameter, a fixed effect for the school s to which student i is associated for block b, a random effect for the combination of school s and block b, and a residual error. Residual errors across blocks for the same student are allowed to be correlated, but are assumed to be independent across students. School-by-block random effects are allowed to be correlated within schools, are constrained to satisfy sum-to-zero constraints within schools for model identifiability, and are assumed to be independent across schools. The model parameters are the block-by-pattern means, the s=1,...,S school fixed effects, the (B x B) variance-covariance matrix G of the school-by-block random effects, and the (B x B) variance-covariance matrix R specifying the variances and covariances of the residual errors. The matrix G has rank at most B-1 due to the sum-to-zero constraints on the school-by-block random effects.
The schoolgrowth
function computes estimates of these
parameters using the moment-based estimation procedure detailed in
Lockwood, Castellano and McCaffrey (2020). It then uses these
parameter estimates, in conjunction with the observed growth scores,
to construct the EBLP of the target estimand for each school. The
target estimand for each school is a linear combination of fixed
effects and random effects. It represents a hypothetical average
growth for a school that would be observed if there were infinitely
many growth scores observed for the school for each of the blocks that
are part of the target. The MSE of the EBLP is estimated for each
school using a first-order plug-in approximation, and if
control$jackknife = TRUE
, a second-order term.
The schoolgrowth
function checks the relationship between
block-by-pattern indicator (dummy) variables and school indicator
variables to ensure that the design matrix based on these two sets of
variables has full column rank. It halts with an error message if
this condition is not met. The condition will not be met if there
exists a proper subset of block-by-pattern combinations, and a proper
subset of schools, with the property that a growth score is linked to
a block-by-pattern combination in the first subset if and only if it
is also linked to a school in the second subset. For example, such a
situation could arise in the analysis of data from grades 4-8 where
each school in the sample served either grades 4-5, or grades 6-8,
with no schools serving more extensive grade ranges.
The EBLP operations require R and G to be PSD. The
estimated values of these matrices may not be PSD without
modifications. For each of R and G, the
schoolgrowth
function provides two options for this coercion;
refer to the control options Radj_method
, Radj_eig_tol
,
Radj_eig_min
, Gadj_method
, Gadj_eig_tol
, and
Gadj_eig_min
.
The function can be used to compute direct estimates, EBLPs, and
associated MSE estimates for grouping variables other than schools
(e.g., school districts) simply by passing that grouping variable as
the school
element of d
.
A list, with elements specified below. Most of these elements contain
meta-data provided for diagnostic purposes and will not be needed by
many users. The most important element is
aggregated_growth
. We describe this element first, and then the
remaining elements.
aggregated_growth:
A dataframe with one record per school providing the EBLP estimate
of the target estimand for each school, along with other
information. Fields include: school
, the school ID
variable from d
; gconfig
, a character string
summarizing the "grade configuration" of the school, defined here
as the unique set of grades with growth scores linked to the
school over all records in d
; ntotal
, the total
number of growth scores linked to the school over all records in
d
; ntarget
, the number of growth scores linked to
the school in blocks that are part of the target;
ncontrast
, the number of growth scores linked to the school
in blocks that are part of the contrast (if target_contrast
is not NULL
); est.direct
, the "direct estimate" of
the target estimand, equal to the appropriately-weighted average
of the growth scores in the target blocks; mse.direct
, the
estimated MSE of the direct estimate; est.blp
, the EBLP of
the target estimand; mse.blp
, the estimated MSE of the
EBLP; est.hybrid
, equal to est.blp
if mse.blp
is less than or equal to mse.direct
and otherwise equal to
est.direct
; mse.hybrid
, equal to the minimum of
mse.blp
and mse.direct
; and prmse.direct
,
equal to 1 - (mse.blp
/ mse.direct
), the estimated
proportional reduction in MSE for the EBLP relative to the direct
estimate.
For schools that do not have growth scores in any of the blocks
that are part of the target estimand, the fields
est.direct
, mse.direct
, est.blp
,
mse.blp
, est.hybrid
, mse.hybrid
, and
prmse.direct
are NA
.
control:
The value of control
used during estimation, including
defaults for elements that are not specified by the user in the
function call.
target:
The value of target
used during estimation.
target_contrast:
The value of target_contrast
used during estimation
(NULL
if target_contrast
is not used).
dblock:
A dataframe with B rows providing block-level meta-data.
There is one row for each block. Fields include the year, the
grade level, the subject, the block label, a numeric block ID
variable ranging from 1 to B, the number of
growth scores in the data for each block, and a logical variable
indicating whether each block is part of the target. If
target_contrast
is specified, dblock
also includes a
logical variable indicating whether each block is included in the
contrast.
dblockpairs: A dataframe with B(B+1)/2 rows
providing meta-data for blocks and all pairwise combinations of
blocks. Rows are in column-major order for a symmetric
matrix. Fields include the block labels for each member of the
pair, the numeric block ID for each member of the pair, the number
of schools with growth scores in both members of the pair, the
number of students with growth score in both members of the pair,
and information about the estimated matrices G and
R. The field Graw
contains the estimated
covariance parameters prior to PSD adjustment, whereas the field
G
contains the estimated covariance parameters after PSD
adjustment. The field R_est
, Rraw
and R
are
analogous for the error variance-covariance matrix.
dsch:
A list with one element per school containing meta-data for each
school used during the estimation procedure. Fields that are most
likely to be of interest to users include: school
, the
school ID; oblocks
, a vector of block IDs containing the
set of blocks for which the school has associated growth measures;
nblock
, the length of oblocks
; N
, a (B x B) sparse symmetric matrix indicating the counts
of growth scores in each block for the school (diagonal elements)
and the counts of students with growth scores in both elements of
each pair of blocks (off-diagonal elements); mu
, the
estimated school fixed effect; var_muhat
, the estimated
variance of mu
; tab
, a dataframe containing
block-level average growth scores and block-level EBLPs for the
school, along with other information; R_sb
, a sparse
symmetric matrix estimating the variance-covariance matrix of the
student-level sampling errors of the block-level average growth
scores for the school; mse_blp
, the estimated MSE of the
block-level EBLPs for the school; and weights
, a matrix
providing the weights that each block received in the computation
of the direct estimate and EBLP for the school. The
weights
element is NULL
for schools with no growth
measures in any of the target blocks.
bhat_ols:
The estimated block-by-pattern means and school fixed effects
obtained during first-stage OLS estimation. Note that the
estimated school fixed effects in bhat_ols
will not
generally correspond to the final estimated school fixed effects
(the mu
elements of dsch
), which are obtained using
generalized least squares in a later estimation step. This is
NULL
if control$alpha_zero = TRUE
.
modstats:
A list containing miscellaneous summary statistics from the data
and estimation procedure. Elements include: ntot
, the
total number of growth scores used in the estimation procedure
(i.e., nrow(d)
); nstu
, the number of unique students
in d
; nsch
, the number of unique schools in
d
; varY
, the sample variance of the growth scores in
d
; estimated_variance_among_schools
, the estimate of
the variance among the school fixed effects; and
estimated_percvar_among_schools
, the estimate of the
percentage of variance in the growth scores explained by the
school fixed effects.
tab_patterns:
A dataframe summarizing the student patterns. Fields include
pattern
, a character string consisting of 0s and 1s
indicating the blocks in which the pattern does or does not have
observed growth scores; pcount
, the number of students
represented by the pattern; and cpattern
, equal to
pattern
if the pattern is maintained during the estimation
procedure, or "collapsed"
if the pattern was collapsed
during the estimation procedure.
G:
The estimated variance-covariance matrix G of the
school-by-block random effects, after any PSD adjustment. It is a
member of the class dspMatrix
of dense, symmetric matrices
from the Matrix
package.
R:
The estimated variance-covariance matrix R of the residual
errors, after any PSD adjustment. It is a member of the class
dsCMatrix
of sparse, symmetric matrices from the
Matrix
package.
d:
If control$return_d = TRUE
, a dataframe similar to d
but with additional fields created by the function.
G_jack:
If control$jackknife = TRUE
, a list of length equal to the
number of jackknife batches used in the EBLP MSE estimation
procedure. Each element is a member of the class dspMatrix
of dense, symmetric matrices from the Matrix
package, equal to the estimated value of G for each
jackknife batch.
Gstar_jack:
If control$jackknife = TRUE
, a list of length equal to the
number of jackknife batches used in the EBLP MSE estimation
procedure. Each element is a member of the class dspMatrix
of dense, symmetric matrices from the Matrix
package, equal to the estimated value of G*
for each
jackknife batch. Each is a ((B-1) x
(B-1)) matrix corresponding to the estimable parameters of
G in light of the sum-to-zero constraints. See Lockwood,
Castellano and McCaffrey (2020) for details.
varhat_G:
If control$jackknife = TRUE
, a jackknife variance estimate
of each element of the estimated value of G. The estimated
variances are organized as a (B x B) matrix,
so that the (i,j) element of varhat_G
is the
jackknife variance estimate of the (i,j) element of
G.
J.R. Lockwood jrlockwood@ets.org
Lockwood J.R, Castellano K.E. and McCaffrey D.F. (2020). “Improving accuracy and stability of aggregate student growth measures using best linear prediction”, Unpublished.
data(growthscores) print(head(growthscores)) print(c(nrecords = nrow(growthscores), nstudent = length(unique(growthscores$stuid)), nschool = length(unique(growthscores$school)))) ## data have B=12 blocks: print(unique(growthscores[,c("year","grade","subject")])) ## fit model ## ## NOTE: jackknife step skipped with 'jackknife=FALSE' control option ## only to minimize computation time for the example m <- schoolgrowth(growthscores, target = c(years="final", subjects="math", grades="all", weights="n"), control=list(quietly=TRUE, mse_blp_chk=FALSE, jackknife=FALSE)) print(names(m)) ## summary information for blocks and block pairs print(m$dblock) print(nrow(m$dblockpairs)) ## B*(B+1)/2 print(head(m$dblockpairs)) ## summary information for each school print(length(m$dsch)) print(names(m$dsch[[1]])) ## miscellaneous meta-data print(m$modstats) ## summary of student patterns print(m$tab_patterns) ## estimated variance components print(m$G) print(m$R) ## direct, EBLP and hybrid estimates of target for each school, ## along with estimated MSEs print(m$aggregated_growth) ## Not run: ## changing the target ## ## math, final year, grade 4: tmp <- schoolgrowth(growthscores, target = c(years="final", subjects="math", grades="4", weights="n")) ## math and ela, both years, grade 4: tmp <- schoolgrowth(growthscores, target = c(years="1,2", subjects="ela,math", grades="4", weights="n")) ## ela, both years, grades 5,6 equally weighted rather than student-weighted: tmp <- schoolgrowth(growthscores, target = c(years="1,2", subjects="ela", grades="5,6", weights="equal")) print(tmp$dsch[[1]]$weights) ## defining a target that is the difference in math growth between years ## 2 and 1, where growth in each year is student-weighted across grades: tmp <- schoolgrowth(growthscores, target = c(years="2", subjects="math", grades="all", weights="n"), target_contrast = c(years="1", subjects="math", grades="all", weights="n")) print(tmp$dsch[[1]]$weights) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.