options(width = 300) options(scipen=20)
knitr::opts_chunk$set(echo = TRUE)
The washb
R package was developed to help standardize and facilitate intention-to-treat analyses in the WASH Benefits trial. This vignette provides an overview of the main functions included in package. As an illustrative example, the vignette uses the Bangladesh trial primary outcomes (diarrhea, length-for-age Z-scores) to calculate several unadjusted and adjusted estimates of intervention effects. It illustrates a range of different estimators from unadjusted approaches such as the paired t-test (see washb_ttest
) to targeted maximum likelihood estimation with ensemble machine learning (see washb_tmle
).
For unadjusted analyses in a paired design, the paired t-test (washb_ttest
) for continuous outcomes and Mantel-Haenszel pooled risk difference or ratio (washb_mh
) are "oldies but goodies" -- they provide a simple and robust way to compare outcomes between arms. However, for unpaired analyses and/or any type of adjusted analyses, you should consider regression (washb_glm
) or semi-parametric, double-robust (washb_tmle
) estimators.
washb_mean
Means estimated with robust standard errors for the WASH Benefits trials. Calculate means for a variable along with robust sandwich SEs and 95% confidence intervals that account for clustering within id.
washb_ttest
Implements a paired t-test to compare continuous outcomes between two arms of the study.
washb_mh
Implements the pooled Mantel-Haenszel estimator of the ITT prevalence ratio or difference between two arms of the study.
washb_glm
Estimates ITT effects using generalized linear models (GLM) with robust standard errors for unadjusted and adjusted estimates. Includes conversions of coefficients to relative risks (for binary outcomes), calculates confidence intervals, and prescreens possible adjustment covariates using the internal washb_prescreen
function. Can be used for subgroup analyses (tests of effect modification).
washb_tmle
Estimates ITT effects using targeted maximum likelihood estimation, which is particularly useful for adjusted analyses. Like washb_glm
, this function provides measures of effect on absolute and relative scales (for binary outcomes), and pre-screens possible adjustment covariates using the internal washb_prescreen
function. Unlike other estimation approaches in the package, washb_tmle
can account for missing outcomes that may be missing systematically by treatment arm or baseline covariates.
washb_permute
Conducts a permutation test of the independence of an outcome by treatment arm, conditional on randomization block using the Wilcoxon rank-sum test statistic.
Internal package functions (only used internally by the main package functions):
* washb_prescreen
Pre-screens adjustment covariates -- restrict to those with a LR test P<0.2 (or user-set p-value). Called internally in the washb_glm
and washb_tmle
functions, or can be called on its own.
sandwichSE
Calculates the Huber Sandwich Estimator for robust standard errors (Freedman 2006).
washb_lincom
Estimates SEs, CIs, and P-values for Pr>|Z| from a linear combination of GLM regression coefficients.
In addition to these functions, the package includes data from the wash benefits trials completed in Bangladesh by the UC Berkeley team. These data sets include:
washb_bang_tr
Bangladesh treatment assignments dataset
washb_bangladesh_antro
Bangladesh anthropometry analysis dataset
washb_bangladesh_diar
Bangladesh diarrhea dataset
washb_bangladesh_enrol
Bangladesh enrollment analysis dataset
washb_bangladesh_track_compound
Bangladesh track compound dataset
washb_bangladesh_uptake
Bangladesh uptake analysis dataset
To load data for analysis, use the command data(washb_bang_tr)
, replacing washb_bang_tr
with the appropriate dataset name.
For new users of R, the following sources are helpful learning resources:
1. Calendar of UC Berkeley D-Lab workshops (Look for R bootcamps)
2. Cookbook for R
3. Datacamp: R for SAS, SPSS, and Stata users (This course is for those familiar with statistical programming who only need to learn R-specific syntax. Great GUI learning interface, and free for introductory course, and academic discount for paid courses.)
4. Try R codeschool (An interactive introductory course)
5. UCLA Institute For Digital Research and Education: R resources (Great source for topic-specific tutorials)
6. R -tutor introduction
7. Johns Hopkins Coursera course in R programming
The WASH Benefits R package is developed on GitHub: https://ben-arnold.github.io/washb. To install the R package from GitHub, you will need the devtools
package. From within R, you can install devtools
with the code: install.packages("devtools")
. This only needs to occur once (per computer). After installation, load the devtools
package (type library(devtools)
), and then install the washb
package by typing:
devtools::install_github("ben-arnold/washb")
library(washb)
We will periodically update the washb
package with improvements and bug fixes. When a new package version is available, the UCB team will send an email to notify the WASHB community of the update. You can always ensure you have the latest version by re-installing the package. If you encounter a bug, find something frustrating or confusing, or have suggestions for improvements email Andrew and Ben to let them know!
The washb
package relies on several functions found in a few different packages that extend the basic R installation. These packages need to be installed onto each computer once, and loaded into the R workspace each time R is opened. If you are installing the packages for the first time, use the following code:
install.packages("sandwich") install.packages("lmtest") install.packages("coin") install.packages("plyr") install.packages("metafor")
If you plan to use targeted maximum likelihood estimation and machine learning through the washb_tmle
function you will additionally need to install:
install.packages("tmle") install.packages("SuperLearner")
Once packages are installed for the first time, they will be automatically loaded along with the washb
package when you load the package:
library(washb)
The library()
command can be used to load any installed package. It is not needed for any of the required packages listed above, but it is for suggested packages. For example, to fit a negative binomial GLM model, the \link[MASS]{MASS} package is required, and can be installed once with install.packages("MASS")
and loaded each time R is opened with library("MASS")
.
Suggested packages include: nnlsm, zoo, arm, MASS, Matrix, lme4, gam, splines, foreach, and glmnet.
Use ??washb
to see full list of package documentation, or use ?function_name
to see help documentation for any specific package. For example, ?washb_glm
returns the help page for the washb_glm
function.
IMPORTANT Read this at least once if you are a WASH B investigator.
The WASH Benefits trials were designed as pair-matched, cluster-randomized trials. The pair-matching is sometimes not immediately obvious because the trial has so many arms. But, we enrolled clusters in groups of 8 geographically contiguous clusters (9 in Kenya, due to the passive control arm) and then allocated the clusters to either one of six intervention arms or the double-sized control (Arnold et al. 2013). It is a geographically pair-matched design, because any comparison between two arms is pair-matched within the randomization block. For example, within a randomization block there will be one cluster allocated to the Nutrition arm "matched" to 2 clusters allocated to the control arm.
Our inference assumes that clusters are independent units. That is important for estimating effects, because if clusters were not independent then we could have contamination (spillover) between arms and it would be impossible to use randomization alone to identify the intervention effects. (We formally tested this assumption in the Bangladesh trial and found no evidence for between-cluster spillover effects; tests for Kenya will be complete by late October.)
To correctly estimate the variance in the pair-matched design, we need to treat the independent unit as the pair (randomization block), because there is some dependence in the outcome induced by matching.
When is it reasonable to treat clusters as the independent unit? First, when estimating the marginal mean of a variable, perhaps by trial arm with washb_means
, then there will be little/no difference between treating clusters versus pairs as the independent unit since with the exception of the control arm there are no replicated units within pair. In designs that are not pair-matched, such as the NIH R21 substudy (where sanitation and control clusters were not enrolled within matched pairs), then it is reasonable to treat the cluster as the independent unit. Treating the pairs/blocks as the independent unit is never wrong, it will just be slightly conservative in analyses that are not pair-matched.
Below we will use some actual analyses from the Bangladesh trial to illustrate the different functions. To do that, we also need to read in final analysis datasets, merge them to baseline characteristics, and merge on the actual (unblinded) treatment assignments. We also do a little bit of minimal data processing to help facilitate the analyses.
This is an example of how to load and merge the enrollment and diarrhea data, with the actual treatment assignments. It creates a final analysis data frame called dd
(for diarrhea data). At this time, these datasets are not included with the package -- if you need access to them, please email the UCB team.
# load and merge the final analysis files # treatment assignments, enrollment characteristics, and diarrhea measurements # Load data from package (also available through OSF: https://osf.io/pqzj5/) data(washb_bang_tr) data(washb_bangladesh_enrol) data(washb_bangladesh_diar) washb_bang_tr <- washb_bang_tr washb_bangladesh_enrol <- washb_bangladesh_enrol washb_bangladesh_diar <- washb_bangladesh_diar # drop svydate and month because they are superseded in the child level diarrhea data washb_bangladesh_enrol$svydate <- NULL washb_bangladesh_enrol$month <- NULL # merge the treatment assignments to the baseline dataset dd <- merge(washb_bangladesh_enrol,washb_bang_tr,by=c("clusterid","block"),all.x=T,all.y=T) # merge the baseline data with diarrhea measurements (keep only compounds with follow-up data) dd <- merge(dd,washb_bangladesh_diar,by=c("dataid","clusterid","block"),all.x=F,all.y=T) # subset to post-intervention measurements: Year 1 or Year 2 dd <- subset(dd,svy==1|svy==2) # exclude new births that are not index children dd <- subset(dd,sibnewbirth==0) # exclude children with missing data dd <- subset(dd,!is.na(dd$diar7d)) # re-order the tr factor for convenience dd$tr <- factor(dd$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH")) # ensure that month is coded as a factor dd$month <- factor(dd$month) # sort the data for perfect replication when using random splits for V-fold cross-validation (washb_tmle and adj permutation tests) dd <- dd[order(dd$block,dd$clusterid,dd$dataid,dd$childid),]
This is an example of how to load and merge the anthropometry and enrollment datasets. It pretty much parallels the processing steps of the diarrhea data (above). It creates a final analysis data frame called lazd
. At this time, the anthropometry dataset is not included with the package -- if you need access to it, please email the UCB team.
# Load data from package data(washb_bangladesh_anthro) # load the anthropometry dataset washb_bangladesh_anthro <- washb_bangladesh_anthro # merge to final analysis files, loaded above (keep only compounds with follow-up data) lazd <- merge(washb_bangladesh_enrol,washb_bang_tr,by=c("clusterid","block"),all.x=T,all.y=T) lazd <- merge(lazd,washb_bangladesh_anthro,by=c("dataid","clusterid","block"),all.x=F,all.y=T) # subset to the index (target) children measured in Year 2 (primary outcome) lazd <- subset(lazd,svy==2) lazd <- subset(lazd,tchild=="Target child") # drop children with extreme LAZ values lazd <- subset(lazd,laz_x!=1) # re-order the tr factor for convenience lazd$tr <- factor(lazd$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH")) # ensure that month is coded as a factor lazd$month <- factor(lazd$month) # rename aged to agedays (for consistency with the diarrhea file) lazd$agedays <- lazd$aged # sort the data for perfect replication when using random splits for V-fold cross-validation (washb_tmle and adj permutation tests) lazd <- lazd[order(lazd$block,lazd$clusterid,lazd$dataid,lazd$childid),]
Now that the dataset is cleaned and merged, the washb
package functions can be applied and the results will match the WASH Benefits Bangladesh primary analysis.
The washb_mean
function is most useful for calculating variable means and confidence intervals -- for example, calculating average compliance (uptake) within a given intervention arm, or calculating the average LAZ by arm or measurement round. In the WASH Benefits trials, the independent unit is typically the cluster, so the id
argument should identify the cluster ID.
Arguments:
Y
= Outcome variable.
id
= ID variable for independent units (e.g., cluster ID).
print
= Logical. If \code{print=TRUE} (default) the function will print the results.
Usage:
washb_mean(Y,id,print=TRUE)
Using the WASH Benefits enrollment survey data and the washb_mean
function, the means and 95% confidence intervals of several maternal characteristics are calculated below:
MomAge<-washb_mean(Y=washb_bangladesh_enrol$momage,id=washb_bangladesh_enrol$clusterid) MomEduY<-washb_mean(Y=washb_bangladesh_enrol$momeduy,id=washb_bangladesh_enrol$clusterid)
The <-
command assigns the output of washb_mean
to the new objects called MomAge
and MomEduY
Notice how there are 20 missing observations in the momage
, but none in the momeduy
variable.
If you want to compare means between groups using a difference, prevalence ratio, or incidence ratio (depending on the outcome), use washb_ttest
(for t-test of unadjusted continuous outcomes), or washb_mh
(for Mantel-Haenszel test on unadjusted binary outcomes), washb_glm
for unadjusted or adjusted estimates using generalized linear models, or washb_tmle
for unadjusted or adjusted estimates using targeted maximum likelihood estimation.
washb_ttest
conducts a paired t-test for the difference in a continuous outcome between two treatment arms. It estimates the paired t-test for differences in means within matched pair (randomization block). This function can be used for unadjusted analyses of continuous outcomes, where the parameter of interest is the difference in means.
Arguments:
Y
=binary outcome (in this example, Y= child length for age z-score lazd$laz
)
tr
=binary treatment group variable, comparison group first
strat
=stratification variable (In WASH Benefits: the pair-matched block variable)
contrast
= character vector of the format c("contrast1", "contrast2")
of the two factor levels within the treatment variable ("tr") to compare. "contrast1"
is the reference group and "contrast2"
is the active comparison.
Usage:
washb_ttest(Y,tr,strat,contrast)
Below, washb_ttest
is used to conduct a paired t-test of the difference in LAZ between the nutrition arm and the control arm in the Bangladesh trial. The code chunk provides an example of how to "vectorize" the call across arms, which works because the washb_ttest
returns a simple vector of numbers (this type of efficient programming does not work well with functions that return more complex output, such as washb_glm
or washb_tmle
).
# Run washb_ttest on Nutrition vs. control arm comparison ttest.C.N <- washb_ttest(Y=lazd$laz,tr=lazd$tr,strat=lazd$block, contrast=c("Control", "Nutrition")) round(ttest.C.N,4)
Advanced R tip: for a function like washb_ttest
that returns a simple vector or list of single parameter estimates, you can use sapply
command to efficiently apply the function to all the treatment arm contrasts
# Create vector of contrasts for the pre-specified hypothesis 1 # (each intervention vs. control) to use sapply h1.contrasts <- list(c("Control","Water"), c("Control","Sanitation"), c("Control","Handwashing"), c("Control","WSH"), c("Control","Nutrition"), c("Control","Nutrition + WSH")) # Use sapply to apply washb_ttest() across all contrasts diff.h1LAZ <- t(sapply(h1.contrasts,washb_ttest,Y=lazd$laz,tr=lazd$tr,strat=lazd$block)) rownames(diff.h1LAZ) <- c("Water v C","Sanitation v C","Handwashing v C","WSH v C","Nutrition v C","Nutrition + WSH v C") round(diff.h1LAZ,4)
washb_mh
estimates a treatment effect (either difference or ratio) for a binary outcome pooled across strata. In the WASH Benefits trials, this function can estimate the unadjusted prevalence difference or ratio accounting for pair-matching in the design (where pair is the stratifying variable).
Arguments:
Y
=binary outcome (in this example, Y= 7-day diarrheal disease recall dd$diar7d
)
tr
=binary treatment group variable, comparison group first
strat
=stratification variable (In WASH Benefits: the pair-matched block variable)
contrast
= character vector of the format c("contrast1", "contrast2")
of the two factor levels within the treatment variable ("tr") to compare. "contrast1"
is the reference group and "contrast2"
is the active comparison.
measure
=measure of effect. RR = prev ratio, RD = prev difference
Usage:
washb_mh(Y,tr,strat,contrast,measure="RR")
Apply washb_mh to the sanitation vs. control arm contrast (With the round()
function added to clean output and keep it from spilling across multiple lines).
mh.C.S <- washb_mh(Y=dd$diar7d,tr=dd$tr, contrast=c("Control","Sanitation"), strat=dd$block,measure="RR") round(mh.C.S,4)
The function washb_mh can also be used to calculate an unadjusted risk difference:
round(washb_mh(Y=dd$diar7d,tr=dd$tr, contrast=c("Control","Sanitation"), strat=dd$block,measure="RD"),4)
Advanced R tip: for a function like washb_mh
that returns a simple vector or list of single parameter estimates, you can use sapply
command to efficiently apply the function to all the treatment arm contrasts
# Create vector of contrasts for the pre-specified hypothesis 1 # (each intervention vs. control) to use sapply h1.contrasts <- list(c("Control","Water"), c("Control","Sanitation"), c("Control","Handwashing"), c("Control","WSH"), c("Control","Nutrition"), c("Control","Nutrition + WSH")) # Apply sapply to run the function on each comparison in the h1.contrasts list diff.h1 <- t(sapply(h1.contrasts,washb_mh,Y=dd$diar7d,tr=dd$tr,strat=dd$block,measure="RR")) rownames(diff.h1) <- c("Water v C","Sanitation v C","Handwashing v C","WSH v C","Nutrition v C","Nutrition + WSH v C") print(round(diff.h1,4))
washb_glm
fits a generalized linear model to estimate intention-to-treat (ITT) effects in a trial. The contrast
argument enables you to specify the arms that you wish to compare (reference group in the first argument, comparison group in the second). To estimate adjusted effects, you can provide a data frame of adjustment covariates using the W
argument -- by default, covariates are pre-screened to only include those that are associated with the outcome based on a likelihood ratio test. To over-ride the pre-screening algorithm for some or all covariates, use the forcedW
argument.
If the design is pair-matched (all primary outcome analyses in WASH B should be pair matched), use the pair
argument to specify an id variable for pairs, and specify the same variable in the id
argument to get correct SEs. If the design is not pair-matched, then the id argument should identify the smallest independent unit in the trial (e.g., cluster). The function computes robust standard errors.
Note that this function automatically drops observations from the analysis if they are from a pair with no treatment contrast -- this can happen with incomplete randomization blocks in the Kenya trial.
Note that for binary outcomes such as diarrhea, you can estimate the prevalence ratio in a GLM model using a log link (family=binomial(link='log')
) rather than the canonical logit link (family='binomial'
) to estimate the odds ratio (not recommended because ORs are harder to interpret). Occasionally, a GLM model with a non-canonical link will fail to converge, particularly if the data are sparse. If this occurs for a binomial family with the log link, we recommend a modified poisson regression to estimate prevalence ratio using the argument family=poisson(link='log')
.
See Zou 2004 and Yelland et al. 2011 for details.
Finally, washb_glm
also makes it straight forward to estimate conditional (i.e., subgroup) effects using the optional V
argument (example below).
Arguments:
Y
= Binary, count, or continuous outcome
tr
= binary treatment group variable, comparison group first
pair
=stratification variable (In WASH Benefits: the pair-matched block variable)
W
= Optional data frame that includes adjustment covariates (for adjusted estimates). W
will be prescreened and only those associated with the outcome will be adjusted for. See washb_prescreen for more details.
forcedW
= Optional vector of variable names to force as adjustment covariates (no screening)
V
= Optional variable name for subgroup analyses, which is interacted with tr
. Continuous variables can be used and an interaction term will be fit, but the calculated point estimates and confidence intervals for the treatment effect between subgroups will only be returned if V is a factor.
id
= id of cluster used in calculating robust standard errors.
contrast
= character vector of the format c("contrast1", "contrast2")
of the two factor levels within the treatment variable ("tr") to compare. "contrast1"
is the reference group and "contrast2"
is the active comparison.
family
= GLM model family (gaussian, binomial, poisson, and negative binomial). Use "binomial(link='log')" to return prevalence ratios instead of odds ratios when the outcome is binary. Use "neg.binom" for a Negative binomial model.
pval
= level of p=value to use in prescreening the set of potential adjustment variables W. Any variable in W with a likelihood ratio test p-value below this threshold will be included in the final adjustment set. Defaults to 0.2
print
= logical (TRUE or FALSE, defaults to TRUE) for whether to print output or just store it in the assigned object.
verbose
= logical (TRUE or FALSE, defaults to FALSE) for whether to print a description of objects to the output.
Usage:
washb_glm(Y, tr, pair, W=NULL, forcedW=NULL, V=NULL, id, contrast, family='gaussian', pval=0.2, print=TRUE)
Returned objects:
* objectname$TR
to return the treatment effect and inference.
* objectname$fit
to return full glm model estimates.
* objectname$vcv
to return the variance-covariance matrix.
* objectname$rowdropped
to return the vector list of observations included in the model fit.
* objectname$lincom
to return subgroup-specific conditional relative risk estimates if a subgroup V
is specified.
* objectname$glmModel
to return the glm model fit to be used with predict()
to return model predictions of the outcome. Note that this is the glm model fit without adjusting the standard errors for repeated measures so you should not use it for inference if the data include repeated observations and/or if you have fit a modified Poisson model. It is safest to use the returned $TR
and $fit
objects for inference, which include robust SEs.
As an example, the following code applies the washb_glm
function to compare diarrhea prevalence between the sanitation and control arms. To estimate the prevalence ratio (PR), we fit the GLM model with a log link, family=binomial(link='log')
. Occasionally, a glm model with a non-canonical link function like family=binomial(link='log')
will fail to converge. If this occurs, use a modified poisson regression to estimate prevalence ratio using the argument family=poisson(link='log')
(see details above for references).
Diar.glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, id=dd$block, contrast=c("Control","Sanitation"), family=binomial(link='log'))
On top of the function's auto-printed output, the washb_glm
function contains a number of objects. For example, 'objectname'$TR
returns just the treatment effect of the intervention arm, useful when saving to a row of an R object containing only the treatment effects across all the desired arm contrasts.
round(Diar.glm.C.S$TR,4)
This result shows that the sanitation intervention led to a (1-r sprintf("%1.2f",Diar.glm.C.S$TR[1])
) = r sprintf("%1.0f",100*(1-Diar.glm.C.S$TR[1]))
percent relative reduction in diarrhea compared with the control arm.
To estimate a difference between arms for a binary outcome (the prevalence difference or risk difference, depending on the outcome definition), the syntax is the same as the previous example except you need to use a different link to specify a linear (rather than log-linear) model: family='gaussian'
. This is sometimes called a linear probability model.
RD.Diar.glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, id=dd$block, contrast=c("Control","Sanitation"), family='gaussian')
This result shows that the sanitation intervention led to a r sprintf("%1.1f",100*(RD.Diar.glm.C.S$TR[1]))
precentage point (absolute) reduction in diarrhea compared with the control arm.
As an example, the following code applies the washb_glm
function to compare LAZ scores between the sanitation and control arms of the trial. Since it is a pair-matched design, we specify pair=lazd$block
and id=lazd$block
, where block
is the randomization block (identified matched pairs). For a continuous outcome, the family should be family='gaussian'
. Finally, the contrast
argument needs to correspond to different levels in the tr
variable. In this example, tr
is a factor with many levels, but we want to compare sanitation to control -- the reference group should be listed first: contrast=c("Control","Sanitation")
. The full command and results are:
LAZ.glm.C.S <- washb_glm(Y=lazd$laz,tr=lazd$tr,pair=lazd$block, id=lazd$block, contrast=c("Control","Sanitation"), family="gaussian", verbose=FALSE)
This result shows there was no evidence for a difference in LAZ between the sanitation and control arms.
The W=
argument in washb_glm
enables you to adjust for covariates in the GLM - W
simply requires a data frame of potential adjustment covariates.
If you embark on adjusted analyses it can be convenient to make a vector that includes the adjustment variable names to screen for inclusion in adjusted GLM models. The convenience of this approach will be apparent in the examples below, where we repeatedly refer to this list of variables when subsetting the data to specify adjustment covariates.
# list of potential adjustment covariates Wadj <- c("month","agedays","sex","momage","momedu","momheight","hfiacat","Nlt18","Ncomp","watmin","elec","floor","walls","roof","asset_wardrobe","asset_table","asset_chair","asset_khat","asset_chouki","asset_tv","asset_refrig","asset_bike","asset_moto","asset_sewmach","asset_mobile")
For the diarrhea analysis data frame created above named dd
, you can now subset it to the adjustment covariates above using the syntax: dd[Wadj]
. Similarly, for the LAZ analysis data frame named lazd
, you can subset it to adjustment covariates: lazd[Wadj]
.
If W
is specified, the washb_glm
will pre-screen the covariates in the data frame provided in the W=
argument. Variables with a p-value<0.2 from a likelihood-ratio test are included as adjustment covariates in the final model. The pval=
argument can be used to change the p-value threshold used to screen the covariates.
adj.Diar.glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd[Wadj], id=dd$block, contrast=c("Control","Sanitation"), family=binomial(link='log'))
Forced adjustment covariates
If there are specific adjustment covariates you want to include in the GLM model, regardless of prescreening results, you can use the forcedW
option. This argument takes a variable name or a vector of variable names. These variables must already be in the W
dataframe of potential adjustment covariates. For example, if adjustment by age and sex is standard, force them into the GLM model.
glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd[Wadj], forcedW=c("agedays","sex"), id=dd$block, contrast=c("Control","Sanitation"), family=binomial(link='log'))
The V
argument can be used to create an interaction term with tr
, the treatment variable. You do not pass an actual vector to V
-- instead, you specify the string name of a variable that is included in the W
matrix. Below is an example for a subgroup analysis of the effect of the nutrition intervention on diarrheal disease, stratified by index child (vs. non-index) -- note how we include "tchild"
in both the forcedW
and V
arguments -- this ensures that the stratification variable isn't dropped if it is not marginally significantly associated with the outcome.
# Estimate subgroup analysis glm with washb_glm # stratified by index child status "tchild" glm.C.N.byChildType <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd["tchild"], V="tchild", id=dd$block, contrast=c("Control","Nutrition"), family=binomial(link='log'), verbose=FALSE) # Examine the treatment effect across subgroups with `objectname'$lincom glm.C.N.byChildType$lincom # Estimate subgroup analysis glm with washb_glm glm.C.N.byChildType <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd["tchild"], V="tchild", id=dd$block, contrast=c("Control","Nutrition"), family="gaussian", verbose=FALSE) # Examine the treatment effect across subgroups with `objectname'$lincom glm.C.N.byChildType$lincom
# Subgroup analysis with multi-level factor # The `V` argument can be used with multilevel factors. Here is code for a subgroup analysis of the effect of sanitation #treatment on child endline LAZ, by household food security levels. # Estimate subgroup analysis glm with washb_glm glm.C.N.byFoodSecurity <- washb_glm(Y=lazd$laz,tr=lazd$tr,pair=lazd$block, W=lazd["hfiacat"], forcedW=NULL, V="hfiacat", id=lazd$block, contrast=c("Control","Nutrition"), family="gaussian", verbose=FALSE)
Estimate intention-to-treat parameters using targeted maximum likelihood estimation (TMLE), potentially adjusted for covariates and missing outcomes
The washb_tmle
function is mainly a convenience wrapper for the tmle
package. It estimates ITT effects in a trial using targeted maximum likelihood estimation (TMLE). In brief, the function does the following: it restricts the data to complete observations in the two arms listed in the contrast
argument, it pre-screens covariates (W
), if specified, to select those that have a univariate association with the outcome, and then it estimates the intention-to-treat effect using TMLE. If family='binomial'
, then the function returns effects on the absolute, relative, and odds ratio scale. If Delta
is specified (i.e., observations with missing outcomes are included), then the function will adjust the effects for missing values using inverse probability of censoring weights, with the weights estimated using data-adaptive super learning for Pr(Delta|A,W)
.
If the analysis is pair-matched (as for primary outcomes), be sure to specify the pair ID in the id
argument. Do not include pair IDs in the adjustment covariate set.
If you specify adjustment covariates (W
), then by default washb_tmle
pre-screens them and the subset that is associated with the outcome based on a likelihood ratio test are used in the estimation. There are some other important defaults to be aware of. First, washb_tmle
estimates the treatment mechanism even though it is a randomized trial. There are two reasons for this -- one theoretical and one practical. The theoretical reason is that estimating the treatment mechanism gains efficiency (see Balzer et al. 2016); the practical reason is that unless the analysis is conducted at the cluster level (i.e., providing cluster means to the washb_tmle
function), then the empirical treatment probabilities differ slightly due to varying cluster sizes. Estimating the treatment mechanism ensures that the variance calculation correctly accounts for the empirical treatment probabilities in the data.
Another default is that washb_tmle
uses the SuperLearner
algorithm to adjust for covariates and to predict the treatment mechanism and censoring mechanism (if adjusting for missing outcomes). The default algorithm library includes the simple mean, main terms GLM, main terms Bayes GLM with non-informative priors, generalized additive models (degree 2), and lasso (glmnet). These are the pre-specified algorithms from the original trial statistical analysis plan. You can type listWrappers()
to see the full set of algorithms implemented in the super learner package. If you just want to use a main effects GLM model to adjust for the covariates with TMLE, then you can specify Q.SL.library="SL.glm"
. If you are dealing with very small sample sizes (e.g., in a substudy), then you may wish to use even simpler libraries, such as a set of univariate regressions (as in Balzer et al. 2016, but probably best to check with the UCB team before making big changes to the recommended algorithm library).
Finally, by default the function uses the same algorithm library to predict the outcome (Q.SL.library
) and the treatment and censoring mechanisms. You can specify a different library for the treatment and censoring mechanisms using the g.SL.library
argument.
If you want to adjust for missing outcomes in the analysis, then you need to include observations that have a missing outcome (Y
) with Delta=0
for those observations. Observations with missing outcomes should have treatment (tr
) and covariate (W
) information, which are used to create weights for Pr(Delta|A,W)
.
Arguments:
Y
= Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
tr
= Treatment group variable (binary or factor)
W
= Data frame that includes adjustment covariates
id
= ID variable for independent units. For pair-matched designs, this is the matched pair and should be the same as the pair
argument. For analyses that are not pair-matched, then it should typically be the cluster.
pair
= An optional ID variable to identify the matched pair unit (In WASH Benefits, blocks) if conducting a matched-pair analysis. This argument is used to drop pairs that are missing one or more treatment groups. Incomplete pairs is not an issue in the overall Bangladesh trial (there were no incomplete blocks), but is an issue in the Kenya trial where there were some incomplete blocks.
Delta
= indicator of missing outcome. 1 - observed, 0 - missing.
family
= Outcome family: gaussian
(continuous outcomes, like LAZ) or binomial
(binary outcomes like diarrhea or stunting)
contrast
= Vector of length 2 that includes the treatment groups to contrast (e.g., contrast=c('Control','Nutrition')
)
Q.SL.Library
= Library of algorithms to include in the SuperLearner for the outcome model
g.SL.library
= Library of algorithms to include in the SuperLearner for the treatment model $Pr(A|W)$, and for the missingness model $Pr(\Delta|A,W)$ (if Delta
is specified)
pval
= The p-value threshold used to pre-screen covariates (W
) based on a likelihood ratio test in a univariate regression with the outcome (Y
). Variables with a univariate association p-value below this threshold will be used in the final model. Defaults to 0.2.
seed
= A seed for the pseudo-random cross-validation split used in model selection (use for perfectly reproducible results).
print
= Logical for printed output, defaults to true. If false, no output will be printed to the console if the returned object is saved to an R object.
Usage:
washb_tmle(Y,tr,W=NULL,id,pair=NULL, Delta = rep(1,length(Y)), family="gaussian", contrast, Q.SL.library=c("SL.mean","SL.glm","SL.bayesglm","SL.gam","SL.glmnet") ,g.SL.library=Q.SL.library, pval=0.2, seed=NULL, print=TRUE)
References:
1. van der Laan MJ, Rose S. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics; 2011.Link
Gruber S, van der Laan M. tmle: An R Package for Targeted Maximum Likelihood Estimation. J Stat Softw. 2012;51: 1–35.Link
Balzer LB, van der Laan MJ, Petersen ML, SEARCH Collaboration. Adaptive pre-specification in randomized trials with and without pair-matching. Stat Med. 2016; doi:10.1002/sim.7023 Link
Schuler MS, Rose S. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies. Am J Epidemiol. 2017;185: 65–73. Link
The washb_tmle
has similar arguments as washb_glm
(by design), with the addition of the super learner libraries. The default library (c("SL.mean","SL.glm","SL.bayesglm","SL.gam","SL.glmnet")
) is pretty good in this context, so we don't actually specify it below (instead relying on the default value).
Below, here is an example of estimating an adjusted difference in LAZ between the nutrition and control arms in the trial:
adj.LAZ.tmle.C.N <- washb_tmle(Y=lazd$laz,tr=lazd$tr,pair=lazd$block,id=lazd$block, W=lazd[Wadj], contrast=c("Control","Nutrition"), family="gaussian")
The default output notifies you if any observations were dropped due to missing values. It also summarizes the weighting of the models or algorithms used to estimate the outcome model (estimation of Q
), the treatment mechanism (estimation of g
), and the missingness mechanism (estimation of g.Delta
).
The function returns an object of class tmle
(you can see the \link[tmle]{tmle} package for details). The effect estimate results are stored in the list $estimates$ATE
:
unlist(adj.LAZ.tmle.C.N$estimates$ATE)
For binary outcomes such as diarrhea, the washb_tmle
syntax is almost unchanged but the family should be specified as family="binomial"
. Unlike GLM, you don't (and shouldn't) specify the link function -- TMLE is a non-parametric estimation approach, so it by default can estimate any function of the means in the treatment groups. The software currently estimates the difference, ratio, and odds ratio. You can access the different parameter estimates using the estimates$
lists:
adj.Diar.tmle.C.S <- washb_tmle(Y=dd$diar7d,tr=dd$tr,pair=dd$block,id=dd$block, W=dd[Wadj], contrast=c("Control","Sanitation"), family="binomial",Q.SL.library=c("SL.mean","SL.glm","SL.bayesglm","SL.gam","SL.glmnet")) # difference round(unlist(adj.Diar.tmle.C.S$estimates$ATE),4) # prevalence ratio round(unlist(adj.Diar.tmle.C.S$estimates$RR),4) # odds ratio round(unlist(adj.Diar.tmle.C.S$estimates$OR),4)
The washb_tmle
function enables you to account for missing data in the outcome using inverse probability of censoring weights (IPCW) to the estimator. You can do this using the Delta=
argument to specify an indicator of whether the outcome. A key step for conducting an IPCW analysis that differs from other analyses in this vignette is that you need to create the "full" data, where there is an observation for every potential measurement (i.e., including those that were missing in the trial). To do this, you need to use the enrolled study population, limited to live births (since the trial's inference is limited to live births). Below, we provide an example using the control and nutrition arms in the Bangladesh trial.
library(tmle) library(SuperLearner) library(nnls) library(lmtest) library(zoo) library(arm) library(MASS) library(Matrix) library(lme4) library(gam) library(splines) library(foreach) library(glmnet)
#--------------------------------------- # create a shell of the full data (dfull) # as if every index child with a live birth # were measured at year 1 and year 2 #--------------------------------------- # Load data from package data(washb_bangladesh_track_compound) washb_bangladesh_track_compound <- washb_bangladesh_track_compound table(washb_bangladesh_track_compound$miss1reason=="No live birth") # total number of unique compounds w/ live birth = 5190 # create the full data (i.e., a dataset with rows for every potential measurement) dfull <- subset(washb_bangladesh_track_compound,miss1reason!="No live birth") dfull$svy <- 1 dfull2 <- dfull dfull2$svy <- 2 dfull <- rbind(dfull,dfull2) # merge treatment and enrollment data onto this shell of the full data dfull <- merge(dfull,washb_bang_tr,by=c("clusterid","block", "tr"),all.x=T,all.y=F) dfull <- merge(dfull,washb_bangladesh_enrol,by=c("dataid","clusterid","block", "tr"),all.x=T,all.y=F) # Since we are only analyzing the primary outcome at year 2 in this example # we need to re-subset the full data to year 2, # just before merging to the LAZ data (lazd, created above) dfull <- subset(dfull,svy==2) # now merge the observed anthropometry outcomes (lazd) onto this full dataset lazdfull <- merge(dfull,washb_bangladesh_anthro,by=c("dataid","clusterid","block","svy", "tr"),all.x=T,all.y=T) lazdfull$agedays <- lazdfull$aged # naming consisetncy across files # create an indicator equal to 1 if the outcome is observed, 0 otherwise lazdfull$Delta <- ifelse(is.na(lazdfull$laz),0,1) table(lazdfull$Delta[lazdfull$tr=="Control"|lazdfull$tr=="Nutrition"]) # set missing outcomes to an arbitrary, non-missing value. In this case use 9 lazdfull$Ydelta <- lazdfull$laz lazdfull$Ydelta[lazdfull$Delta==0] <- 9 # estimate an IPCW-TMLE parameter psi_ipcw <- washb_tmle(Delta=lazdfull$Delta,tr=lazdfull$tr, id=lazdfull$block, pair=lazdfull$block,Y=lazdfull$Ydelta, family="gaussian", contrast=c("Control","Nutrition"), W=lazdfull[Wadj], seed=12345)
In this particular example, accounting for missing outcomes makes very little difference after adjusting for covariates -- adjusted, and adjusted+IPCW estimates are nearly identical.
#--------------------------------------- # compare the estimates #--------------------------------------- round(unlist(adj.LAZ.tmle.C.N$estimates$ATE),4) round(unlist(psi_ipcw$estimates$ATE),4)
In unadjusted ITT analyses, all the estimators we have discussed above are asymptotically equivalent. What this means is that as sample size increases to infinity, the estimates will converge to the same number. So, for unadjusted analyses, the choice of estimator is not very consequential. The primary analyses in Kenya and Bangladesh used paired t-tests (continuous) and stratified Mantel-Haenszel (binary) estimators for unadjusted analyses, and TMLE for adjusted analyses. GLM is a perfectly sensible approach as well, and one advantage of using GLM is that you can use the same estimator through all analyses (unadjusted, adjusted).
TMLE with super learning has some advantages in the adjusted analysis in that it will likely be more powerful than GLM because of the data-adaptive estimation in predicting the outcome. Note that a main effects GLM with covariates is a TMLE estimator as long as there are no interactions between covariates and the treatment arm, just assuming a parametric submodel (confused yet? See Balzer et al. 2016 and/or talk to us at UCB) . One final advantage of TMLE is that it will enable you to account for missing outcomes, which are common for some secondary specimen outcomes.
unadjusted analyses
As seen below, the estimates and 95 percent confidence intervals are virtually the same across estimators in an unadjusted analysis, even in this finite sample example (Nutrition vs. Control LAZ).
# t-test ttest.C.N <- washb_ttest(Y=lazd$laz,tr=lazd$tr,strat=lazd$block, contrast=c("Control","Nutrition")) round(ttest.C.N,4) # GLM, unadjusted LAZ.glm.C.N <- washb_glm(Y=lazd$laz,tr=lazd$tr,pair=lazd$block, id=lazd$block, contrast=c("Control","Nutrition"), family="gaussian", print=FALSE) round(LAZ.glm.C.N$TR,4) # TMLE, unadjusted LAZ.tmle.C.N <- washb_tmle(Y=lazd$laz,tr=lazd$tr,pair=lazd$block,id=lazd$block, W=NULL, contrast=c("Control","Nutrition"), family="gaussian",print=FALSE,seed=12345) round(unlist(LAZ.tmle.C.N$estimates$ATE),4)
adjusted analyses
The GLM and TMLE estimators are also similar in adjusted analyses. Below, we repeat the contrast above, also adjusting for pre-specified covariates by specifying W=lazd[Wadj]
. In this particular example, the esitmates and confidence intervals are very similar between the two approaches. We have found over a large number of comparisons that the TMLE estimator is the same or more efficient than the GLM estimator.
# GLM, adjusted adj.LAZ.glm.C.N <- washb_glm(Y=lazd$laz,tr=lazd$tr,pair=lazd$block, id=lazd$block, W=lazd[Wadj], contrast=c("Control","Nutrition"), family="gaussian", print=FALSE) round(adj.LAZ.glm.C.N$TR,4) # TMLE, adjusted adj.LAZ.tmle.C.N <- washb_tmle(Y=lazd$laz,tr=lazd$tr,pair=lazd$block,id=lazd$block, W=lazd[Wadj], contrast=c("Control","Nutrition"), family="gaussian",print=FALSE,seed=12345) round(unlist(adj.LAZ.tmle.C.N$estimates$ATE),4)
unadjusted analysis
As seen below, the estimates and 95 percent confidence intervals for the prevalence ratio of diarrhea (binary outcome) are virtually the same, even in this finite sample example (Sanitation vs. Control).
# Mantel-Haenszel (estimated above) mh.C.S <- washb_mh(Y=dd$diar7d,tr=dd$tr, contrast=c("Control","Sanitation"), strat=dd$block,measure="RR") round(mh.C.S,4) # GLM, unadjusted Diar.glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, id=dd$block, contrast=c("Control","Sanitation"), family=binomial(link='log'),print=FALSE) round(Diar.glm.C.S$TR,4) # TMLE, unadjusted Diar.tmle.C.S <- washb_tmle(Y=dd$diar7d,tr=dd$tr,pair=dd$block,id=dd$block, W=NULL, contrast=c("Control","Sanitation"), family="binomial",print=FALSE,seed=12345) round(unlist(Diar.tmle.C.S$estimates$RR),4)
adjusted analyses
The GLM and TMLE estimators are also similar in adjusted analyses. Below, we repeat the contrast above, also adjusting for pre-specified covariates by specifying W=dd[Wadj]
. In this particular example, the estimates and confidence intervals are very similar between the two approaches. We have found over a large number of comparisons that the TMLE estimator is the same or more efficient than the GLM estimator.
# GLM, adjusted adj.Diar.glm.C.S <- washb_glm(Y=dd$diar7d,tr=dd$tr,pair=dd$block, id=dd$block, W=dd[Wadj],contrast=c("Control","Sanitation"), family=binomial(link='log'),print=FALSE) round(adj.Diar.glm.C.S$TR,4) # TMLE, adjusted adj.Diar.tmle.C.S <- washb_tmle(Y=dd$diar7d,tr=dd$tr,pair=dd$block,id=dd$block, W=dd[Wadj], contrast=c("Control","Sanitation"), family="binomial",print=FALSE,seed=12345) round(unlist(adj.Diar.tmle.C.S$estimates$RR),4)
Non-parametric test of complete independence (i.e., the strong null hypothesis) using the Wilcoxon Signed Rank permutation test.
WASH Benefits Wilcoxon Signed Rank permutation test function for two treatment arms conditional on randomization block. It conducts a permutation test of the independence of Y
and tr
, conditional on matched pair (randomization block) using the Wilcoxon rank-sum test statistic.
The washb_glm
and related functions (above) enable us to test hypotheses about whether the average outcome differs between trial arms -- that is, whether the difference in means is equal to zero. The mean is just one function of the outcome distribution. A sharper test is the “sharp null hypothesis” first proposed by Ronald Fisher, in which we test whether there is any difference at all in the outcome distributions between intervention arms. Under the null hypothesis of no treatment effect, the outcome distributions should be indistinguishable from random sampling variation. A way to test the sharp null hypothesis is with a permutation test (also called a Fisher randomization test). The intuition behind the test is that there is a single source of random variation in the trial: namely the random allocation of treatment. If we re-randomize treatment in every possible combination (or a very large number of combinations), and compare arms using a test statistic for each combination, then this provides us with the test statistic’s distribution under the null hypothesis of no effect (since we re-shuffle treatment in each iteration, it is completely uninformative). We can then compare the observed test statistic with real group assignments to the null distribution to see how likely it would be to have occurred by chance. For more details on randomization tests, see the references below.
In the WASH Benefits primary analysis, we pre-specified that we would test this sharp null using a Wilcoxon signed rank test, which is a non-parametric test statistic that has been shown to have good power against alternatives for outcomes that could potentially have skewed distributions Imbens GW, Rubin DB. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press; 2015..
Note, however, that in every test we have performed, p-values obtained under this strong null hypothesis have been entirely consistent with p-values obtained from paired t-tests, Mantel-Haenszel chi-squared tests, and even adjusted estimates. Therefore, going the extra mile to estimate these tests may be unnecessary for many analyses in the WASH Benefits trials. Nevertheless, we have included the function in the package because it might be useful.
Arguments:
Y
= Outcome variable (continuous, such as LAZ, or binary, such as diarrhea).
tr
= Binary treatment group variable (ideally a factor), comparison group first.
pair
= Pair-matched randomization ID variable (in WASH Benefits: block).
contrast
= Vector of length 2 that includes the groups to contrast, e.g., c("Control","Water").
nreps
= Number of permutations to run.
seed
= Number for psuedo-random number generation in R.
Usage:
washb_permute(Y,tr,pair,contrast,nreps=100000,seed=NULL)
References:
1. Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B. & Pee, D. On Design Considerations and Randomization-Based Inference for Community Intervention Trials. Statist. Med. 15, 1069–1092 (1996).Link
Braun, T. M. & Feng, Z. Optimal Permutation Tests for the Analysis of Group Randomized Trials. Journal of the American Statistical Association 96, 1424–1432 (2001). Link
Rosenbaum, P. R. Covariance Adjustment in Randomized Experiments and Observational Studies. Statist. Sci. 17, 286–327 (2002). Link
Apply the washb_permute
function to the Sanitation-Control arm comparison for diarrhea:
permute.C.S <- washb_permute(Y=dd$diar7d, tr=dd$tr, pair=dd$block, contrast=c("Control","Sanitation"), nreps=100000, seed=242524)
The permutation test can also test for independence of Y
and tr
, conditional on W
, by applying the washb_permute
function to the residuals of an adjusted model (adjusted for W
, but not tr
). The permutation test can have even greater power against alternatives if it is conducted on residuals from an algorithmic fit that removes variability of the outcome due to characteristics other than treatment. The intuition is that by removing outcome variability that is not associated with the treatment of interest, we can more narrowly focus our inference on differences due to treatment. In the example below, we show how to use a simple linear regression to predict LAZ, and then conduct the permutation test on the residuals from the regression. Alternatively, a more flexible, machine-learning algorithm approach, such as super learner, can be used in the initial prediction step.
Here is an example of an adjusted permutation test, using linear regression to adjust for covariates:
# Use the washb_prescreen function to select adjustment covariates associated with the outcome adj.W<-washb_prescreen(Y=lazd$laz,lazd[Wadj],family="gaussian", pval=0.2) #Subset the LAZ dataset to control and nutrition arms: laz.subset=lazd[which(lazd$tr=="Control"|lazd$tr=="Nutrition"),] #Subset the LAZ dataset to the selected adjustment covariates and LAZ, as well as tr and block, which will be needed in the permutation test perm.adj.data <- subset(laz.subset,select=c(adj.W, "laz", "tr", "block")) #Subset to complete cases perm.adj.data<-perm.adj.data[complete.cases(perm.adj.data),] Wselect <- subset(perm.adj.data,select=c(adj.W, "laz")) #fit the glm model fit <- glm(laz~., family="gaussian", data=Wselect) #Use the predict to return predicted LAZ from the adjusted glm model, and subtract it from the observed LAZ outcome residuals<-Wselect$laz-predict(fit) #run the permutation test function permute.adj.C.S <- washb_permute(Y=residuals,tr=perm.adj.data$tr,pair=perm.adj.data$block,contrast=c("Control","Sanitation"), nreps=100000,seed=12345)
And here is an example of using an adaptive algorithm, super learner, to estimate the residuals in advance of the permutation test:
# pre-screen the covariates for those associated with the outcome (LR test P<0.2) # uses internal functions washb_prescreen and design_matrix # subset to complete observations (no missing values) adjpdata <- subset(lazd,select=c("laz","block","tr",Wadj)) adjpdata <- adjpdata[complete.cases(adjpdata),] Wscreen <- washb_prescreen(Y=adjpdata$laz,Ws=adjpdata[Wadj],family="gaussian") # convert adjustment covariates into a full design matrix Wselect <- subset(adjpdata,select=Wscreen) Wselect <- design_matrix(Wselect) # algorithmic fit of the outcome as a function of selected covariates set.seed(12345) SLfit1 <- SuperLearner(Y=adjpdata$laz,X=Wselect,id=adjpdata$block, family="gaussian", SL.library=c("SL.mean","SL.glm","SL.bayesglm","SL.gam","SL.glmnet") ) SLfit1 adjpdata$pY <- as.vector(predict(SLfit1)$pred) adjpdata$r <- adjpdata$laz-adjpdata$pY # Test of the strong-null hypothesis for differences between sanitation and control arms in LAZ distributions permute.sl.C.S <- washb_permute(Y=adjpdata$r,tr=adjpdata$tr,pair=adjpdata$block,contrast=c("Control","Sanitation"),nreps=100000,seed=12345)
Note: The washb_prescreen
function was written as internal function (called by washb_glm
and washb_tmle
), but we have provided some additional details for how they work in case investigators wish to use them separately.
The washb_prescreen
function selects covariates with univariate associations with the outcome of P<0.2 (default) based on a likelihood ratio test. It is called as a part of the washb_glm
function to select adjustment covariates, but can also be run as an independent function. The pval=
argument can be used to alter the p-value threshold for inclusion.
Arguments:
Y
Outcome variable (continuous, such as LAZ, or binary, such as diarrhea).
Ws
data frame that includes candidate adjustment covariates to screen.
family
GLM model family ("gaussian"
, "binomial"
, "poisson"
, binomial(link='log')
, or "neg.binom"
for negative binomial).
pval
The p-value threshold: any variables with a p-value from the likelihood ratio test below this threshold will be returned. Defaults to 0.2.
print
Logical for whether to print function output, defaults to TRUE.
Usage:
washb_prescreen(Y=dd$diar7d,Ws=W,family=binomial(link='log'), pval=0.2, print=TRUE)
The washb_prescreen
function performs a likelihood ratio test on a set of potential covariates and returns all covariates with an associated p-value<0.2 (unless a custom pvalue is specified). It can be called as a function on its own, and is also called internally within the washb_glm function. Unless otherwise specified by the use, the washb_glm function with therefore only include covariates with a LR-test p-value <0.2 in the model. See the Function: washb_glm section for more details.
prescreened_varnames<-washb_prescreen(Y=dd$diar7d,dd[Wadj],family=binomial(link='log'), pval=0.2)
The function runs a likelihood ratio test between a generalized linear model fit to each screened variable and a null model. The first section of the output includes all screened variables and the p-values from the likelihood ratio test. The second section outputs only those variables with a p-value less than the thresholds set with the pval
argument (<0.2 by default).
Use the following code to return the saved a list of variable names for the selected variables. Then, subset the covariate dataframe to only those variables:
prescreened_varnames prescreened_vars <- subset(dd[Wadj],select=prescreened_varnames) # Examine the first five observations of the second selected variable, child age in days: prescreened_vars[1:5,2]
Note: washb_lincom
function was written as an internal function (called by washb_glm, above), but we have provided some additional details for how it works in case investigators wish to use it separately.
The subgroup analysis option in washb_glm
internally calls the washb_lincom
function to calculate estimates, SEs, CIs, and P-values from a linear combination of regression coefficients. The washb_lincom
function can be called alone to calculate a custom linear combination of coefficients.
Arguments:
lc
= Index vector of coefficients from glm modellinear combination of coefficients
varlist
= Character vector of variables to include. Alternative to lc. If lc is specified, this argument is ignored.
fit
= Model object returned from coeftest (class=coeftest) command within the washb_glm function. Accessed with $fit from washb_glm output.
vcv
= variance-covariance matrix of coefficients. Accessed with $vcv from washb_glm output.
measure
= measure of effect. RR = risk ratio, RD = risk difference
flag
= Internal argument used to flag and suppress printing if the washb_lincom function is called within another function.
Usage:
washb_lincom(lc=NULL,varlist=NULL,fit,vcv, measure="RR", flag=NULL)
Below is an example of using the lincom()
function to externally replicate the index child/sibling subgroup analysis of the diarrheal disease outcome. First, recall the GLM model contrasting diarrheal disease prevalence ratio between index child and sibling fitted to diarrheal disease:
glm.C.N.byChildType$lincom
Next, use the lincom function to verify the subgroup analysis calculated within the washb_glm()
function. washb_lincom()
takes as arguments the fit and variance-covariance matrix returned by the glm function, and an index vector lc
.
lc
= vector indexing coefficients to be linearly combined. If a single coefficient is chosen, it will return the same coefficient and confidence interval as the GLM coefficient. Example:
#Create lc vector of 0's equal in length to the number of coefficients from the glm model. lc=rep(0,nrow(glm.C.N.byChildType$fit)) #Examine model coefficients (minus the pair-matched block estimates) to determine the position of coefficients to combine. glm.C.N.byChildType$fit[1:3,] #Replace the second position in the vector with 1 (the position of the treatment coefficient in the model) lc[2]<-1 #Run the lincom function and compare output to the treatment effect from the GLM model. washb_lincom(lc=lc,fit=glm.C.N.byChildType$fit,vcv=glm.C.N.byChildType$vcv, measure="RR")
Because sibling is the reference level of the VTarget child
term, running the lincom()
function with only the treatment contrast term marked in the lc
index vector equals both the prevalence ratio of the trSanitation
term and the prevalence ratio of the sibling subgroup returned in the glm.C.S.byChildType$lincom
object. To estimate the PR from the target child subgroup, the interaction term trSanitation:VTarget child
needs to be marked in the lc
index.
#Add a 1 at the 4th position in the lc vector to include the 4th coefficient, the interaction term, in the linear combination. lc[4]<-1 washb_lincom(lc=lc,fit=glm.C.N.byChildType$fit,vcv=glm.C.N.byChildType$vcv, measure="RR")
This estimate equals the target child PR estimated in the washb_glm
estimate with the V
argument specified. Note that the coefficient for the VTarget child
term is not included in this linear combination calculation. Including that term would calculate the prevalence ratio between sanitation arm target children and control arm siblings, whereas in a subgroup analysis the prevalence ratio between treatments within, rather than across, the groups is desired.
# Below, run the diarrhea and LAZ primary outcome and subgroup analysis for all contrasts. Not included in the document, but output in the table at the bottom of the document. # Diar unadj.glm.h1 <- t(sapply(h1.contrasts,washb_glm,Y=dd$diar7d,tr=dd$tr,pair=dd$bloc, W=NULL, forcedW=NULL, V=NULL, id=dd$block, family=binomial(link='log'), print=FALSE)) adj.glm.h1 <- t(sapply(h1.contrasts,washb_glm,Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd[Wadj], forcedW=NULL, V=NULL, id=dd$block, family=binomial(link='log'), print=FALSE)) glm.byChildType <- lapply(h1.contrasts,washb_glm,Y=dd$diar7d,tr=dd$tr,pair=dd$block, W=dd["tchild"], forcedW=NULL, V="tchild", id=dd$block, family=binomial(link='log'), print=FALSE) # LAZ Unadjusted GLM unadj.glm.h1LAZ <- lapply(h1.contrasts,washb_glm,Y=lazd$laz,tr=lazd$tr,pair=lazd$block, W=NULL,forcedW=NULL, V=NULL, id=lazd$block, family="gaussian",print=FALSE) #Adjusted GLM adj.glm.h1LAZ <- lapply(h1.contrasts,washb_glm,Y=lazd$laz,tr=lazd$tr,pair=lazd$block, W=lazd[Wadj],forcedW=NULL, V=NULL, id=lazd$block, family="gaussian",print=FALSE) #Unadjusted subgroup: unadj.glm.byFoodSecurity <- lapply(h1.contrasts,washb_glm,Y=lazd$laz,tr=lazd$tr,pair=lazd$block, W=W_hfiacat, forcedW=NULL, V="hfiacat", id=lazd$block, family="gaussian", print=FALSE) #adj.glm.h1LAZ[[5]]$TR #unadj.glm.byFoodSecurity[[5]]$lincom
#Table of Primary Outcome Results ##7-day diarrheal disease recall outcome Contrast v. control | Estimator | PR | 95% CI | SE logPR | P-value --------------------|-----------|----|--------|----------|-------- Water | Unadjusted MH | `r round(diff.h1[1,1],3)` | `r round((diff.h1[1,2:3]),3)` | `r round(diff.h1[1,5],3)` | `r round(diff.h1[1,7],3)` Water | Unadjusted GLM | `r round(unadj.glm.h1[[1]][1],3)` |(`r round(unadj.glm.h1[[1]][2:3],3)`) | `r round(unadj.glm.h1[[1]][5],3)` | `r round(unadj.glm.h1[[1]][7],3)` Water | Adjusted GLM | `r round(adj.glm.h1[[1]][1],3)` |(`r round(adj.glm.h1[[1]][2:3],3)`) | `r round(adj.glm.h1[[1]][5],3)` | `r round(adj.glm.h1[[1]][7],3)` Water | Adjusted GLM + TMLE | | | | Water | Adjusted SL + TMLE | | | | Water | Wilcoxon permutation test | | | |`r #permute.diff.h1[1]` -----------------------------------------|--------------------------------------|---------------|-----------------------|------------|----------- Sanitation | Unadjusted MH | `r round(diff.h1[2,1],3)` | `r round((diff.h1[2,2:3]),3)` | `r round(diff.h1[2,5],3)` | `r diff.h1[2,7]` Sanitation | Unadjusted GLM | `r round(unadj.glm.h1[[2]][1],3)` |(`r round(unadj.glm.h1[[2]][2:3],3)`) | `r round(unadj.glm.h1[[2]][5],3)` | `r unadj.glm.h1[[2]][7]` Sanitation | Adjusted GLM | `r round(adj.glm.h1[[2]][1],3)` |(`r round(adj.glm.h1[[2]][2:3],3)`) | `r round(adj.glm.h1[[2]][5],3)` | `r (adj.glm.h1[[2]][7])` Sanitation | Adjusted GLM + TMLE | | | |- Sanitation | Adjusted SL + TMLE | | | |- Sanitation | Wilcoxon permutation test | | | |`r #permute.diff.h1[2]` -----------------------------------------|--------------------------------------|---------------|-----------------------|------------|----------- Handwashing | Unadjusted MH | `r round(diff.h1[3,1],3)` | `r round((diff.h1[3,2:3]),3)` | `r round(diff.h1[3,5],3)` | `r round(diff.h1[3,7],3)` Handwashing | Unadjusted GLM | `r round(unadj.glm.h1[[3]][1],3)` |(`r round(unadj.glm.h1[[3]][2:3],3)`) | `r round(unadj.glm.h1[[3]][5],3)` | `r round(unadj.glm.h1[[3]][7],3)` Handwashing | Adjusted GLM | `r round(adj.glm.h1[[3]][1],3)` |(`r round(adj.glm.h1[[3]][2:3],3)`) | `r round(adj.glm.h1[[3]][5],3)` | `r round(adj.glm.h1[[3]][7],3)` Handwashing | Adjusted GLM + TMLE | | | | Handwashing | Adjusted SL + TMLE | | | | Handwashing | Wilcoxon permutation test | | | |`r #permute.diff.h1[3]` -----------------------------------------|--------------------------------------|---------------|-----------------------|------------|----------- WSH | Unadjusted MH | `r round(diff.h1[4,1],3)` | `r round((diff.h1[4,2:3]),3)` | `r round(diff.h1[4,5],3)` | `r round(diff.h1[4,7],3)` WSH | Unadjusted GLM | `r round(unadj.glm.h1[[4]][1],3)` |(`r round(unadj.glm.h1[[4]][2:3],3)`) | `r round(unadj.glm.h1[[4]][5],3)` | `r round(unadj.glm.h1[[4]][7],3)` WSH | Adjusted GLM | `r round(adj.glm.h1[[4]][1],3)` |(`r round(adj.glm.h1[[4]][2:3],3)`) | `r round(adj.glm.h1[[4]][5],3)` | `r round(adj.glm.h1[[4]][7],3)` WSH | Adjusted GLM + TMLE | | | | WSH | Adjusted SL + TMLE | | | | WSH | Wilcoxon permutation test | | | |`r #permute.diff.h1[4]` -----------------------------------------|--------------------------------------|---------------|-----------------------|------------|----------- Nutrition | Unadjusted MH | `r round(diff.h1[5,1],3)` | `r round((diff.h1[5,2:3]),3)` | `r round(diff.h1[5,5],3)` | `r round(diff.h1[5,7],3)` Nutrition | Unadjusted GLM | `r round(unadj.glm.h1[[5]][1],3)` |(`r round(unadj.glm.h1[[5]][2:3],3)`) | `r round(unadj.glm.h1[[5]][5],3)` | `r round(unadj.glm.h1[[5]][7],3)` Nutrition | Adjusted GLM | `r round(adj.glm.h1[[5]][1],3)` |(`r round(adj.glm.h1[[5]][2:3],3)`) | `r round(adj.glm.h1[[5]][5],3)` | `r round(adj.glm.h1[[5]][7],3)` Nutrition | Adjusted GLM + TMLE | | | | Nutrition | Adjusted SL + TMLE | | | | Nutrition | Wilcoxon permutation test | | | |`r #permute.diff.h1[5]` -----------------------------------------|--------------------------------------|---------------|-----------------------|------------|----------- WSH+Nutrition | Unadjusted MH | `r round(diff.h1[6,1],3)` | `r round((diff.h1[6,2:3]),3)` | `r round(diff.h1[6,5],3)` | `r round(diff.h1[6,7],3)` WSH+Nutrition | Unadjusted GLM | `r round(unadj.glm.h1[[6]][1],3)` |(`r round(unadj.glm.h1[[6]][2:3],3)`) | `r round(unadj.glm.h1[[6]][5],3)` | `r round(unadj.glm.h1[[6]][7],3)` WSH+Nutrition | Adjusted GLM | `r round(adj.glm.h1[[6]][1],3)` |(`r round(adj.glm.h1[[6]][2:3],3)`) | `r round(adj.glm.h1[[6]][5],3)` | `r round(adj.glm.h1[[6]][7],3)` WSH+Nutrition | Adjusted GLM + TMLE | | | | WSH+Nutrition | Adjusted SL + TMLE | | | | WSH+Nutrition | Wilcoxon permutation test | | | |`r #permute.diff.h1[6]` <br> <br> <br> ##Length-for-Age Z-score outcome Contrast v. control | Estimator | Coef | 95% CI | SE | P-value --------------------|-----------|------|--------|-----|-------- Water | Unadjusted ttest | `r round(diff.h1LAZ[1,1],3)` |(`r round((diff.h1LAZ[1,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[1,4],3)`| `r round(diff.h1LAZ[1,5],3)` Water | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[1]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[1]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[1]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[1]]$TR[6],3)` Water | Adjusted GLM | `r round(adj.glm.h1LAZ[[1]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[1]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[1]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[1]]$TR[6],3)` Water | Adjusted GLM + TMLE | | | | Water | Adjusted SL + TMLE | | | | Water | Wilcoxon permutation test | | | | -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- Sanitation | Unadjusted ttest | `r round(diff.h1LAZ[2,1],3)` |(`r round((diff.h1LAZ[2,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[2,4],3)`| `r round(diff.h1LAZ[2,5],3)` Sanitation | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[2]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[2]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[2]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[2]]$TR[6],3)` Sanitation | Adjusted GLM | `r round(adj.glm.h1LAZ[[2]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[2]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[2]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[2]]$TR[6],3)` Sanitation | Adjusted GLM + TMLE | | | | Sanitation | Adjusted SL + TMLE | | | | Sanitation | Wilcoxon permutation test | | | | -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- Handwashing | Unadjusted ttest | `r round(diff.h1LAZ[3,1],3)` |(`r round((diff.h1LAZ[3,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[3,4],3)`| `r round(diff.h1LAZ[3,5],3)` Handwashing | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[3]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[3]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[3]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[3]]$TR[6],3)` Handwashing | Adjusted GLM | `r round(adj.glm.h1LAZ[[3]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[3]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[3]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[3]]$TR[6],3)` Handwashing | Adjusted GLM + TMLE | | | | Handwashing | Adjusted SL + TMLE | | | | Handwashing | Wilcoxon permutation test | | | | -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- WSH | Unadjusted ttest | `r round(diff.h1LAZ[4,1],3)` |(`r round((diff.h1LAZ[4,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[4,4],3)`| `r round(diff.h1LAZ[4,5],3)` WSH | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[4]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[4]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[4]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[4]]$TR[6],3)` WSH | Adjusted GLM | `r round(adj.glm.h1LAZ[[5]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[5]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[5]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[5]]$TR[6],3)` WSH | Adjusted GLM + TMLE | | | | WSH | Adjusted SL + TMLE | | | | WSH | Wilcoxon permutation test | | | | -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- Nutrition | Unadjusted ttest | `r round(diff.h1LAZ[5,1],3)` |(`r round((diff.h1LAZ[5,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[5,4],3)`| `r round(diff.h1LAZ[5,5],3)` Nutrition | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[5]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[5]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[5]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[5]]$TR[6],3)` Nutrition | Adjusted GLM | `r round(adj.glm.h1LAZ[[5]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[5]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[5]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[5]]$TR[6],3)` Nutrition | Adjusted GLM + TMLE | | | | Nutrition | Adjusted SL + TMLE | | | | Nutrition | Wilcoxon permutation test | | | | -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- WSH+Nutrition | Unadjusted ttest | `r round(diff.h1LAZ[6,1],3)` |(`r round((diff.h1LAZ[6,2:3]),3)`) | t-stat: `r round(diff.h1LAZ[6,4],3)`| `r round(diff.h1LAZ[6,5],3)` WSH+Nutrition | Unadjusted GLM | `r round(unadj.glm.h1LAZ[[6]]$TR[1],3)` |(`r round(unadj.glm.h1LAZ[[6]]$TR[2:3],3)`) |SE: `r round(unadj.glm.h1LAZ[[6]]$TR[4],3)` | `r round(unadj.glm.h1LAZ[[6]]$TR[6],3)` WSH+Nutrition | Adjusted GLM | `r round(adj.glm.h1LAZ[[6]]$TR[1],3)` |(`r round(adj.glm.h1LAZ[[6]]$TR[2:3],3)`) |SE: `r round(adj.glm.h1LAZ[[6]]$TR[4],3)` | `r round(adj.glm.h1LAZ[[6]]$TR[6],3)` -----------------------------------------|--------------------------------------|---------------|--------------|------------|------------- <br> <br> <br> ##Diarrheal disease comparisons between treatments and control, stratified into target children and siblings. Contrast v. control | Child Type | Coefficient | 95% CI | P-value --------------------|------------|-------------|----------|--------- Water |`r glm.byChildType[[1]]$lincom[1,1]` | `r glm.byChildType[[1]]$lincom[1,2]` | `r round(glm.byChildType[[1]]$lincom[1,4:5],3)` | `r glm.byChildType[[1]]$lincom[1,7]` - |`r glm.byChildType[[1]]$lincom[2,1]` | `r glm.byChildType[[1]]$lincom[2,2]` | `r round(glm.byChildType[[1]]$lincom[2,4:5],3)` | `r glm.byChildType[[1]]$lincom[2,7]` Sanitation |`r glm.byChildType[[2]]$lincom[1,1]` | `r glm.byChildType[[2]]$lincom[1,2]` | `r round(glm.byChildType[[2]]$lincom[1,4:5],3)` | `r glm.byChildType[[2]]$lincom[1,7]` - |`r glm.byChildType[[2]]$lincom[2,1]` | `r glm.byChildType[[2]]$lincom[2,2]` | `r round(glm.byChildType[[2]]$lincom[2,4:5],3)` | `r glm.byChildType[[2]]$lincom[2,7]` Handwashing |`r glm.byChildType[[3]]$lincom[1,1]` | `r glm.byChildType[[3]]$lincom[1,2]` | `r round(glm.byChildType[[3]]$lincom[1,4:5],3)` | `r glm.byChildType[[3]]$lincom[1,7]` - |`r glm.byChildType[[3]]$lincom[2,1]` | `r glm.byChildType[[3]]$lincom[2,2]` | `r round(glm.byChildType[[3]]$lincom[2,4:5],3)` | `r glm.byChildType[[3]]$lincom[2,7]` WSH |`r glm.byChildType[[4]]$lincom[1,1]` | `r glm.byChildType[[4]]$lincom[1,2]` | `r round(glm.byChildType[[4]]$lincom[1,4:5],3)` | `r glm.byChildType[[4]]$lincom[1,7]` - |`r glm.byChildType[[4]]$lincom[2,1]` | `r glm.byChildType[[4]]$lincom[2,2]` | `r round(glm.byChildType[[4]]$lincom[2,4:5],3)` | `r glm.byChildType[[4]]$lincom[2,7]` Nutrition |`r glm.byChildType[[5]]$lincom[1,1]` | `r glm.byChildType[[5]]$lincom[1,2]` | `r round(glm.byChildType[[5]]$lincom[1,4:5],3)` | `r glm.byChildType[[5]]$lincom[1,7]` - |`r glm.byChildType[[5]]$lincom[2,1]` | `r glm.byChildType[[5]]$lincom[2,2]` | `r round(glm.byChildType[[5]]$lincom[2,4:5],3)` | `r glm.byChildType[[5]]$lincom[2,7]` WSH+Nutrition |`r glm.byChildType[[6]]$lincom[1,1]` | `r glm.byChildType[[6]]$lincom[1,2]` | `r round(glm.byChildType[[6]]$lincom[1,4:5],3)` | `r glm.byChildType[[6]]$lincom[1,7]` - |`r glm.byChildType[[6]]$lincom[2,1]` | `r glm.byChildType[[6]]$lincom[2,2]` | `r round(glm.byChildType[[6]]$lincom[2,4:5],3)` | `r glm.byChildType[[6]]$lincom[2,7]` <br> <br> <br> ##Risk difference between treatments and control, stratified into target children and siblings. Contrast v. control | Child Type | Coefficient | 95% CI | P-value --------------------|------------|-------------|----------|----------- Water |`r glm.byChildType[[1]]$lincomRD[1,1]` | `r glm.byChildType[[1]]$lincomRD[1,2]` | `r round(glm.byChildType[[1]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[1]]$lincomRD[1,7]` - |`r glm.byChildType[[1]]$lincomRD[2,1]` | `r glm.byChildType[[1]]$lincomRD[2,2]` | `r round(glm.byChildType[[1]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[1]]$lincomRD[2,7]` Sanitation |`r glm.byChildType[[2]]$lincomRD[1,1]` | `r glm.byChildType[[2]]$lincomRD[1,2]` | `r round(glm.byChildType[[2]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[2]]$lincomRD[1,7]` - |`r glm.byChildType[[2]]$lincomRD[2,1]` | `r glm.byChildType[[2]]$lincomRD[2,2]` | `r round(glm.byChildType[[2]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[2]]$lincomRD[2,7]` Handwashing |`r glm.byChildType[[3]]$lincomRD[1,1]` | `r glm.byChildType[[3]]$lincomRD[1,2]` | `r round(glm.byChildType[[3]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[3]]$lincomRD[1,7]` - |`r glm.byChildType[[3]]$lincomRD[2,1]` | `r glm.byChildType[[3]]$lincomRD[2,2]` | `r round(glm.byChildType[[3]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[3]]$lincomRD[2,7]` WSH |`r glm.byChildType[[4]]$lincomRD[1,1]` | `r glm.byChildType[[4]]$lincomRD[1,2]` | `r round(glm.byChildType[[4]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[4]]$lincomRD[1,7]` - |`r glm.byChildType[[4]]$lincomRD[2,1]` | `r glm.byChildType[[4]]$lincomRD[2,2]` | `r round(glm.byChildType[[4]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[4]]$lincomRD[2,7]` Nutrition |`r glm.byChildType[[5]]$lincomRD[1,1]` | `r glm.byChildType[[5]]$lincomRD[1,2]` | `r round(glm.byChildType[[5]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[5]]$lincomRD[1,7]` - |`r glm.byChildType[[5]]$lincomRD[2,1]` | `r glm.byChildType[[5]]$lincomRD[2,2]` | `r round(glm.byChildType[[5]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[5]]$lincomRD[2,7]` WSH + Nutrition |`r glm.byChildType[[6]]$lincomRD[1,1]` | `r glm.byChildType[[6]]$lincomRD[1,2]` | `r round(glm.byChildType[[6]]$lincomRD[1,4:5],3)` | `r glm.byChildType[[6]]$lincomRD[1,7]` - |`r glm.byChildType[[6]]$lincomRD[2,1]` | `r glm.byChildType[[6]]$lincomRD[2,2]` | `r round(glm.byChildType[[6]]$lincomRD[2,4:5],3)` | `r glm.byChildType[[6]]$lincomRD[2,7]` <br> <br> <br> ##Food Security Subgroup ###LAZ comparisons between treatment and control arms, stratified by food security level. Results from the washb_glm function run above with `V="hfiacat"`. Contrast v. control | Food Security Subgroup | Coefficient | 95% CI | P-value --------------------|------------------------|-------------|----------|--------- Nutrition |`r unadj.glm.byFoodSecurity[[5]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[5]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[5]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[5]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[5]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[5]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[5]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[5]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[5]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[5]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[5]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[5]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[5]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[5]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[5]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[5]]$lincom[4,7]` Water |`r unadj.glm.byFoodSecurity[[1]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[1]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[1]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[1]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[1]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[1]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[1]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[1]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[1]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[1]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[1]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[1]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[1]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[1]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[1]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[1]]$lincom[4,7]` Sanitation |`r unadj.glm.byFoodSecurity[[2]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[2]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[2]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[2]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[2]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[2]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[2]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[2]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[2]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[2]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[2]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[2]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[2]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[2]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[2]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[2]]$lincom[4,7]` Handwashing |`r unadj.glm.byFoodSecurity[[3]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[3]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[3]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[3]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[3]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[3]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[3]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[3]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[3]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[3]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[3]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[3]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[3]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[3]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[3]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[3]]$lincom[4,7]` WSH |`r unadj.glm.byFoodSecurity[[4]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[4]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[4]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[4]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[4]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[4]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[4]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[4]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[4]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[4]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[4]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[4]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[4]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[4]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[4]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[4]]$lincom[4,7]` WSH + Nutrition |`r unadj.glm.byFoodSecurity[[6]]$lincom[1,1]` | `r unadj.glm.byFoodSecurity[[6]]$lincom[1,2]` | `r round(unadj.glm.byFoodSecurity[[6]]$lincom[1,4:5],3)` | `r unadj.glm.byFoodSecurity[[6]]$lincom[1,7]` - |`r unadj.glm.byFoodSecurity[[6]]$lincom[2,1]` | `r unadj.glm.byFoodSecurity[[6]]$lincom[2,2]` | `r round(unadj.glm.byFoodSecurity[[6]]$lincom[2,4:5],3)` | `r unadj.glm.byFoodSecurity[[6]]$lincom[2,7]` - |`r unadj.glm.byFoodSecurity[[6]]$lincom[3,1]` | `r unadj.glm.byFoodSecurity[[6]]$lincom[3,2]` | `r round(unadj.glm.byFoodSecurity[[6]]$lincom[3,4:5],3)` | `r unadj.glm.byFoodSecurity[[6]]$lincom[3,7]` - |`r unadj.glm.byFoodSecurity[[6]]$lincom[4,1]` | `r unadj.glm.byFoodSecurity[[6]]$lincom[4,2]` | `r round(unadj.glm.byFoodSecurity[[6]]$lincom[4,4:5],3)` | `r unadj.glm.byFoodSecurity[[6]]$lincom[4,7]` <br> <br> <br>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.