knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) #devtools::load_all("~/repositories/_r/propertee/") library(propertee)
The propertee package, Prognostic Regression Offsets with Propagation of ERrors (for Treatment Effect Estimation), facilitates direct adjustment or standardization for experiments and observational studies, with the option of using a separately fitted prognostic regression model for covariance adjustment. Propertee calls for a self-standing specification of a study's design and treatment allocations, using these to create weights needed to align treatment and control samples for estimation of treatment effects averaged across a common standard population, with standard errors that appropriately reflect the study design. For covariance adjustment it enables offsetting the outcome against predictions from a dedicated covariance model, with standard error calculations propagating error as appropriate from the covariance model.
The main workflow consists of two main steps and one optional step:
StudySpecification
object. This
is accomplished with the obs_spec()
, rct_spec()
or rd_spec()
functions.propertee
, for
example lm()
or glm()
from the stats
package, or lmrob
from
the robustbase
package.lm()
to contrast the separately reweighted treatment and
control groups, incorporating optional covariance adjustments as an
offset
to lm()
. The StudySpecification
is unobtrusively
retained, as is error propagation from the optional covariance
adjustment, for use in subsequent standard error calculations.
This is done using the lmitt()
function, which either takes and
enriches a fitted lm
or, in the more fully supported usage,
internally calls lm()
to itself create a directly adjusted
contrast of treatment conditions.The example dataset comes from the state of Tennessee's Student-Teacher Achievement Ratio (STAR) experiment. Students were randomly assigned to three possible classroom conditions: small (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher's aide).
data("STARplus") table(STARplus$cond_at_entry)
For simplicity for this first example, we will examine a single binary treatment - "small" classrooms versus "regular" and "regular+aide" classrooms.
STARplus$cond_small <- STARplus$cond_at_entry == "small" table(STARplus$cond_small)
After this basic example, we will see how propertee makes it easy to handle
non-binary treatment variables by introducing dichotomy
s.
The outcome of interest is a reading score at the end of kindergarten.
summary(STARplus$g1treadss)
We first take the random assignment scheme to have been either complete or
Bernoulli randomization of students, separately by school. (Later we'll change this to the more realistic assumption of complete or Bernoulli randomization, separately by school and year of entry into the study.) Accordingly, experimental blocks are given by the school_at_entry
variable.
length(unique(STARplus$school_at_entry)) head(table(STARplus$school_at_entry))
We need a unique identifier for the unit by which treatment was allocated. STAR treatments were assigned to students, as opposed to classrooms or other aggregates of students; accordingly we need a unique identifier per student. The stdntid
variable fills this role.
length(unique(STARplus$stdntid)) head(STARplus$stdntid)
StudySpecification
The three _spec
functions (rct_spec()
, obj_spec()
, and rd_spec()
)
operate similarly. The first argument is the most important, and encodes all the
specification information through the use of a formula. The left-hand side of
the formula identifies the treatment variable. The right-hand side of the
formula consists of the following potential pieces of information:
unit_of_assignment()
: This identifies the variable(s) which indicate the
units of assignment. This is required for all specifications. The alias
uoa()
can be used in its place.block()
: The identifies the variable(s) which contain block information.
Optional.forcing()
: In regression discontinuity specifications (rd_spec()
), this
identifies the variable(s) which contain forcing information.To define a StudySpecification
in our example:
spec <- obs_spec(cond_small ~ unit_of_assignment(stdntid) + block(school_at_entry), data = STARplus, na.fail = FALSE) summary(spec)
Should additional variables be needed to identify the unit of assignment,
block, or forcing, they can be included. Had the variable identifying the years in which participants entered the study been available, for example, we might have identified the experimental blocks as combinations of school and year of study entry, rather than as the school alone. To do this we would pass block(year_at_entry, school_at_entry)
, rather than block(school_at_entry)
, in the above.
The main function for estimating treatment effects is the lmitt()
function. It
takes in three main required arguments:
StudySpecification
.Note that the data set does not need to be the same data set which generated
the StudySpecification
; it does however need to include the same variables to
identify the units of assignment. (If the variable names differ, the by=
argument can be used to link them, though we recommend renaming to reduce the
likelihood of issues.)
For example, you may have one dataset containing school-level information, and a
separate dataset containing student-level information. Assume school is the unit
of assignment. While you could of course merge those two data-sets,
propertee can instead use the school-level data to define the
StudySpecification
, and the student-level data to estimate the treatment
effect.
The formula entering lmitt()
can take on one of two forms:
y ~ 1
will estimate the main treatment effect on outcome variable y
, and
y ~ x
will estimate subgroup specific treatment effects for each level of x
for the
outcome y
, if x
is categorical. For continuous x
, a main effect and a
treatment-x
interaction effect is estimated.
Therefore, to estimate the treatment effect in our example, we can run:
te <- lmitt(g1treadss ~ 1, data = STARplus, specification = spec) summary(te)
The data includes ethnicity; we can estimate subgroup effects by ethnicity:
te_s <- lmitt(g1treadss ~ race, data = STARplus, specification = spec) summary(te_s)
Study specification weights can be easily included in this estimation. propertee supports average treatment effect (ATE) and effect of the treatment on the treated (ETT) weights.
To include one of the weights, simply include the weights = "ate"
or
weights = "ett"
argument to lmitt()
:
lmitt(g1treadss ~ 1, data = STARplus, specification = spec, weights = "ate") lmitt(g1treadss ~ 1, data = STARplus, specification = spec, weights = "ett")
Internally, these call the ate()
or ett()
functions which can be used
directly.
head(ate(spec, data = STARplus))
When included inside lmitt()
, you do not need to specify any additional
arguments to ate()
or ett()
, enabling easy functions of weights. For example
if you had some other weight variable, say wgt
, you could include weights =
wgt*ate()
in the lmitt()
call.
By itself, lmitt()
does not allow for other covariates; e.g. something like
lmitt(y ~ 1 + control_var,...
will fail. To adjust for covariates, a separate
covariate model should be fit. Any model which supports a predict()
function
should work.
camod <- lm(g1treadss ~ gender * dob + race, data = STARplus)
The cov_adj()
function can be used to process the covariance adjustment model
and produce the required values; and its output can be passed as an offset=
.
lmitt(g1treadss ~ 1, data = STARplus, specification = spec, weights = "ate", offset = cov_adj(camod))
Similarly to the weight functions, cov_adj()
attempts to locate the correct
arguments (in this case, mainly the data=
argument) to use in the model
command; while cov_adj()
does fall back to using the data which is in the
covariance model, its safer to use the newdata=
argument if calling
cov_adj()
outside of the model.
head(cov_adj(camod, newdata = STARplus))
Also, similarly to weights, cov_adj()
can be used in normal modeling commands
as well.
lm(g1treadss ~ cond_small, data = STARplus, weights = ate(spec), offset = cov_adj(camod))
If fixed effects for blocks are desired, which can be absorbed away to avoid
estimating, the absorb=TRUE
argument can be passed.
lmitt(g1treadss ~ 1, data = STARplus, specification = spec, absorb = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.