washb_glm: Generalized linear model function for the WASH Benefits study
In ben-arnold/washb: Internal WASH Benefits Function Package

washb_glm

R Documentation

Generalized linear model function for the WASH Benefits study

Description

Estimate intention-to-treat parameters with generalized linear models and robust standard errors

Usage

washb_glm(Y,tr,pair,W=NULL, forcedW=NULL, V=NULL, id,contrast,family="gaussian", pval=0.2, print=TRUE)

Arguments

`Y`	Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
`tr`	Binary treatment group variable (ideally a factor), comparison group first
`pair`	Pair-matched randomization ID variable (in WASH Benefits: block)
`W`	Optional data frame that includes adjustment covariates (for adjusted estimates)
`forcedW`	Optional vector of variable names to force as adjustment covariates (no screening)
`V`	Optional variable name for subgroup analyses, which is interacted with 'tr'.
`id`	ID variable for independent units (cluster ID)
`contrast`	Vector of length 2 that includes the groups to contrast, e.g., c("Control","Water")
`family`	GLM model family (gaussian, binomial, poisson, and negative binomial). Use `family=binonial(link='log')` (no quotes) to return prevalence ratios instead of odds ratios when the outcome is binary. Use "neg.binom" for a Negative binomial model.
`pval`	The p-value threshold: any variables with a p-value from the likelihood ratio test below this threshold will be returned. Defaults to 0.2
`print`	Logical for whether to print function output, defaults to TRUE.
`verbose`	Logical for whether to print names and descriptions of returned list objects
`FECR`	(default is `NULL`). Estimate the fecal egg count reduction (FECR) proportion by specifying either `FECR='arithmetic'` to estimate it on the artithmetic mean scale or `FECR='geometric'` on the geometric mean scale. If `FECR='geometric'` ensure that you use log-transformed eggs per gram. When estimating the FECR, also ensure that you specify `family='gaussian'` (see details).

Details

washb_glm fits a generalized linear model to estimate intention-to-treat (ITT) effects in a trial. The contrast argument enables you to specify the arms that you wish to compare (reference group in the first argument, comparison group in the second). To estimate adjusted effects, you can pass a data.frame of adjustment covariates to the W argument – by default, covariates are pre-screened to only include those that are associated with the outcome based on a likelihood ratio test. To over-ride the pre-screening algorithm for some or all covariates, use the forcedW argment.

If the design is pair-matched (all primary outcome analyses in WASH B should be pair matched), use the pair argument to specify an id variable for pairs, and specify the same variable in the id argument to get correct SEs. If the design is not pair-matched, then the id argument should identify the smallest independent unit in the trial (e.g., cluster). The function computes robust standard errors. Note that this function automatically drops observations from the analysis if they are from a pair with no treatment contrast – this can happen with incomplete randomization blocks in the Kenya trial.

Note that for binary outcomes such as diarrhea, the glm model is fit with a log link, family=binomial(link='log'), to estimate prevalence ratios rather than the canonical logit link, family='binomial', to estimate the odds ratio. Occasionally, a glm model with a non-canonical link function like family=binomial(link='log') will fail to converge, particularly if the data are sparse. If this occurs, use a modified poisson regression to estimate prevalence ratio using the argument family=poisson(link='log'). See Zou 2004 (https://www.ncbi.nlm.nih.gov/pubmed/15033648) for details.

The function also makes it straight forward to estimate conditional (i.e., subgroup) effects using the optional V argument.

Value

Returns a list of the risk ratios or risk differences, the variance-covariance matrix, and a vector indexing the rows of observations used to fit the glm model

Examples


#washb_glm function applied to the Bangladesh diarrheal disease outcome to determine both unadjusted and adjusted
#prevalence ratios between intervention and control arms.


#Cleans and merge the enrollment and diarrhea data:
library(washb)
data(washb_bangladesh_enrol)
washb_bangladesh_enrol <- washb_bangladesh_enrol
data(washb_bangladesh_diar)
washb_bangladesh_diar <- washb_bangladesh_diar

# drop svydate and month because they are superceded in the child level diarrhea data
#washb_bangladesh_enrol$svydate <- NULL
#washb_bangladesh_enrol$month <- NULL

# merge the baseline dataset to the follow-up dataset
ad <- merge(washb_bangladesh_enrol,washb_bangladesh_diar,by=c("dataid","clusterid","block","tr"),all.x=F,all.y=T)

# subset to the relevant measurement
# Year 1 or Year 2
ad <- subset(ad,svy==1|svy==2)

#subset the diarrhea to children <36 mos at enrollment
### (exlude new births that are not target children)
ad <- subset(ad,sibnewbirth==0)
ad <- subset(ad,gt36mos==0)

# Exclude children with missing data
ad <- subset(ad,!is.na(ad$diar7d))

#Re-order the tr factor for convenience
ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH"))

#Ensure that month is coded as a factor
ad$month <- factor(ad$month)

#Sort the data for perfect replication when using V-fold cross-validation
ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),]

###Create vector of contrasts for each hypothesis to facilitate comparisons between arms.
#Hypothesis 1: Each intervention arm vs. Control
h1.contrasts <- list(
  c("Control","Water"),
  c("Control","Sanitation"),
  c("Control","Handwashing"),
  c("Control","WSH"),
  c("Control","Nutrition"),
  c("Control","Nutrition + WSH")
)



###Unadjusted GLM estimates for diarrheal disease outcome.
#As an example, the following code applies the washb_glm function to compare 7-day recall diarrheal disease prevalence between the sanitation and control arms. Notice that the glm model is fit with a log link, `family=binomial(link='log')`, to estimate prevalence ratios rather than with a logit link, `family="binomial"`, to estimate the odds ratio.

Diar.glm.C.S <- washb_glm(Y=ad$diar7d,tr=ad$tr,pair=ad$block, id=ad$clusterid, contrast=c("Control","Sanitation"), family=binomial(link='log'))

#On top of the function's auto-printed output, the washb_glm function contains a number of objects. For example, `'objectname'$vcv` returns the variance-covariance matrix.


#All returned objects are:
#'objectname$fit` to return full glm model estimates.
#'objectname$vcv` to return the variance-covariance matrix.
#'objectname$rowdropped` to return the vector list of observations included in the model fit.
#'objectname$lincom` to return subgroup-specific conditional relative risk estimates if a subgroup V is specified.

ben-arnold/washb documentation built on Dec. 11, 2023, 7:06 p.m.