View source: R/washb_prescreen.R
washb_prescreen | R Documentation |
Pre-screen covariates using a likelihood ratio test.
washb_prescreen(Y, Ws, family = "gaussian", pval = 0.2, print = TRUE)
Y |
Outcome variable (continuous, such as LAZ, or binary, such as diarrhea) |
Ws |
data frame that includes candidate adjustment covariates to screen |
family |
GLM model family (gaussian, binomial, poisson, or negative binomial). Use "neg.binom" for Negative binomial. |
pval |
The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2 |
print |
Logical for whether to print function output, defaults to TRUE. |
Function returns the list of variable names with a likelihood ratio test p-value <0.2 (unless a custom p-value is specified).
#Prescreen function applied to the Bangladesh diarrheal disease outcome.
#The function will test a matrix of covariates and return those related to child diarrheal disease with
#a <0.2 p-value from a likelihood ratio test.
#Load diarrhea data:
library(washb)
data(washb_bangladesh_enrol)
washb_bangladesh_enrol <- washb_bangladesh_enrol
data(washb_bangladesh_diar)
washb_bangladesh_diar <- washb_bangladesh_diar
# drop svydate and month because they are superceded in the child level diarrhea data
washb_bangladesh_enrol$svydate <- NULL
washb_bangladesh_enrol$month <- NULL
# merge the baseline dataset to the follow-up dataset
ad <- merge(washb_bangladesh_enrol,washb_bangladesh_diar,by=c("dataid","clusterid","block","tr"),all.x=F,all.y=T)
# subset to the relevant measurement
# Year 1 or Year 2
ad <- subset(ad,svy==1|svy==2)
#subset the diarrhea to children <36 mos at enrollment
### (exlude new births that are not target children)
ad <- subset(ad,sibnewbirth==0)
ad <- subset(ad,gt36mos==0)
# Exclude children with missing data
ad <- subset(ad,!is.na(ad$diar7d))
#Re-order the tr factor for convenience
ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH"))
#Ensure that month is coded as a factor
ad$month <- factor(ad$month)
#Sort the data for perfect replication when using V-fold cross-validation
ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),]
###Subset to a new dataframe the variables to be screened:
Ws <- subset(ad,select=c("fracode","month","agedays","sex","momage","momedu","momheight","hfiacat","Nlt18","Ncomp","watmin","elec","floor","walls","roof","asset_wardrobe","asset_table","asset_chair","asset_khat","asset_chouki","asset_tv","asset_refrig","asset_bike","asset_moto","asset_sewmach","asset_mobile"))
###Run the washb_prescreen function
prescreened_varnames<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial")
###Rerun the function with a stricter p=value
prescreened_varname2s<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial", pval=0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.