fastDR | R Documentation |
Doubly robust treatment effect estimation with non-parametric propensity score estimates
fastDR(form.list,
data,
y.dist="gaussian",
estimand="ATT",
n.trees=3000,
interaction.depth=3,
shrinkage=0.01,
verbose=FALSE,
ps.only=FALSE,
keepGLM=TRUE,
smooth.lm=0,
par_details=gbmParallel(1,1024))
form.list |
a list of formulas giving the components of the model, outcomes, treatment indicator, covariates, observation weights, and observation keys. Consider using |
data |
a data frame containing the variables in the model |
y.dist |
the distribution assumed for the outcome. These should be
specified as character strings (e.g. "gaussian", "quasibinomial",
"quasipoisson"). Using "binomial" will cause inconsequential warning messages
since the model uses non-integer propensity score weights. “quasibinomial”
avoids this. If |
estimand |
either "ATT" (default) for the average treatment effect or "ATE" for the average treatment effect |
n.trees |
the total number of trees to fit for the generalized boosted model. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. The default is most likely appropriate |
interaction.depth |
The maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc. |
shrinkage |
a shrinkage parameter applied to each tree in the generalized boosted model expansion. Also known as the learning rate or step-size reduction. The default is most likely appropriate |
verbose |
If TRUE, the fastDR will print out progress and performance indicators |
ps.only |
If TRUE, the fastDR will skip the DR step and just return the propensity scores. This is useful if you need to do non-standard changes to the regression model, like eliminating some covariates |
keepGLM |
if TRUE, keep a copy of the |
smooth.lm |
a numeric value for penalizing the size of the covariates in the DR step |
par_details |
a list containing the number of threads and array_chunk_size to be passed to |
The form.list has the following components.
y.form
is a formula specifying the outcomes (e.g. ~y
). Multiple
outcomes may be listed (e.g. ~y1+y2
); fastDR will conduct a separate
analysis for each of them
t.form
is a formula specifying the treatment indicator (e.g. ~treat). The
formula can include only one binary treatment coded as 0/1
x.form
is a formula specifying the potential confounding variables that
will be included in the propensity score model and in the outcome model (e.g.
~X1+X2+X2
)
weights.form
is an optional formula specifying observation weights, such as sampling weights
key.form
is a formula giving observation IDs (e.g. ~caseID
). This is required in order to make sure that after returning results, the propensity
score weights can be correctly aligned with cases by the user
fastDR
only estimates the average treatment effect on the treated.
fastDR
uses a generalized boosted model to estimate a propensity score model (McCaffrey, Ridgeway, and Morral, 2004). It uses those propensity scores as weights in a weighted generalized linear model to produce a doubly robust estimate of the treatment effect.
To reuse the propensity score weights in other analyses, make sure that the code aligns the right weight with the right case. Use the key in the data and the key attached to the weights. For example, mydata$psw <- myfastDR$w[match(mydata$observationID,names(myfastDR$w))]
For ATT, treated cases receive weight equal to 1 and comparison cases receive weight p/(1-p). For ATE, treated cases receive weight equal to 1/p and comparison cases receive weight 1/(1-p)
The effective sample size is the number of independent cases that would yield the same precision as the given weighted sample. It is computed as
\frac{(\sum_i (1-t_i)w_i)^2}{\sum_i (1-t_i)w_i^2}
When smooth.lm
is greater than 0, fastDR
uses a data augmentation method to impose a penalty on the regression coefficients in the DR step. At a minimum, modest penalties provide numerical stability in the event of correlation among the covariates or certain features being highly predictive of the outcome. fastDR
does not penalize the intercept or the treatment indicator. fastDR
uses the data augmentation approach described in Greenland and Mansournia (2015). This approach uses ridge regression penalties (N(0,m)) for Gaussian outcomes, log-F(m,m) prior for logistic regression, and gamma(m,m) prior for Poisson regression. The appropriate size of the penalty is subjective, but values between 0.1 and 4.0 are modest penalties for general application.
If the number of treatment cases exceeds 1500, then fastDR
will take a random sample of 1500 cases for computing the standard errors of the estimates of E(Y1|t=1) and E(Y1-Y0|t=1). Computing the standard error requires computing the mean of the n1*n1 covariance matrix of predicted values, which can require a lot of memory when n1 is large, but can be efficiently estimated with a sample. Note that there will be some amount of sampling variability.
fastDR
returns a fastDR.object
, a list containing
ks |
the largest KS statistic in the balance table (see |
ks.un |
the largest KS statistic in the balance table before propensity score weighting (see |
balance.tab |
the balance table showing the treatment means and propensity score weighted control means for each of the terms specified in |
balance.tab.un |
the balance table showing the treatment means and unweighted control means for each of the terms specified in |
w |
the propensity score weights used in the analysis. Use |
p |
the estimated propensity scores. |
n1 , ESS |
the number of cases in the treatment group and the effective sample size of the control group |
glm.un , glm.ps , glm.dr |
the generalized linear model fits using only sampling weights ( |
effects |
a table showing the results. |
y.dist |
contains the value of |
shrinkage |
the shrinkage parameter used in the propensity score model |
Greg Ridgeway gridge@upenn.edu
D. McCaffrey, G. Ridgeway, and A. Morral (2004). “Propensity score estimation with boosted regression for evaluating adolescent substance abuse treatment,” Psychological Methods 9(4):403-425.
J.H. Friedman (2001). “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics 29(5):1189-1232.
S. Greenland and M.A. Mansournia (2015). “Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions,” Statistics in Medicine 34:3133-3143.
S. Greenland, M.A. Mansournia, and D.G. Altman (2014). “Sparse data bias: a problem hiding in plain sight,” The BMJ 353:i1981.
http://sites.google.com/site/gregridgeway
gbmt
, gbmParallel
# NHANES example from survey package
data(nhanes)
# add a unique ID to each row
nhanes$observationID <- 1:nrow(nhanes)
# recode the "treatment" (male) to a 0/1 indicator
nhanes$male <- as.numeric(nhanes$RIAGENDR==1)
# make "race" a factor with descriptive labels
nhanes$race <- factor(nhanes$race,
levels=c(1,2,3,4),
labels=c("Hispanic","non-Hispanic white",
"non-Hispanic black","other"))
dr1 <- fastDR(list(y.form=~HI_CHOL, # high cholesterol (over 240mg/dl)
t.form=~male, # compare males to similar females
x.form=~race+agecat, # potential confounders
weights.form=~WTMEC2YR, # sampling weights
key.form=~observationID),
data=nhanes,
y.dist="quasibinomial", # outcome is 0/1
n.trees=1000,
interaction.depth=3,
shrinkage=0.005,
verbose=TRUE, # print out detailed progress
smooth.lm=0.1, # stabilize regression coefs
par_details=gbmParallel(1,1024)) # just use one core
# show balance table
round(dr1$balance.tab,3)
# compute percentage difference
print(dr1, type="outcome", model="dr")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.