In sbohora/sAUC: Semi-parametric Area Under the Curve (AUC) regression

knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(sAUC)
set.seed(2017)

Perform AUC analyses with discrete covariates and a semi-parametric estimation

Installation

install.packages("sAUC")

OR

devtools::install_github("sbohora/sAUC")

Example

To illustrate how to apply the proposed method, we obtained data from a randomized and controlled clinical trial, which was designed to increase knowledge and awareness to prevent Fetal Alcohol Spectrum Disorders (FASD) in children through the development of printed materials targeting women of childbearing age in Russia. One of the study objectives was to evaluate effects of FASD education brochures with different types of information and visual images on FASD related knowledge, attitudes, and alcohol consumption on childbearing-aged women. The study was conducted in two regions in Russia including St. Petersburg (SPB) and the Nizhny Novgorod Region (NNR) from 2005 to 2008. A total of 458 women were recruited from women's clinics and were randomly assigned to one of three groups (defined by the GROUP variable): (1) a printed FASD prevention brochure with positive images and information stated in a positive way, positive group (PG), (2) a FASD education brochure with negative messages and vivid images, negative group (NG); and (3) a general health material, control group (CG). For the purpose of the analysis in this thesis, only women in the PG and CG were included. Data were obtained from the study principal investigators . The response variable was the change in the number of drinks per day (CHANGE_DRINK=number of drinks after-number of drinks before) on average in the last 30 days from one-month follow-up to baseline. Two covariates considered for the proposed method were "In the last 30 days, have you smoked cigarettes?" (SMOKE) and "In the last 30 days, did you take any other vitamins?" (OVITAMIN). Both covariates had "Yes" or "No" as the two levels. The question of interest here was to assess the joint predictive effects of SMOKE and OVITAMIN on whether the participants reduced the number of drinks per day from baseline to one month follow-up period. A total of 210 women with no missing data on any of the CHANGE_DRINK, SMOKE, GROUP, and OVITAMIN were included in the analysis.

The response variable CHANGE_DRINK was heavily skewed and not normally distributed in each group (Shapiro-Wilk p<0.001). Therefore, we decided to use the AUG regression model to analyze the data. In the AUG regression model we define $$\large \pi = p(Y_{CG} > Y_{PG})$$ Note that the value of $\large \pi$ greater than .5 means that women in the PG had a greater reduction of alcohol drinks than those in the CG. For statistical results, all p-values < .05 were considered statistically significant and 95% CIs were presented.

We first fit an AUC regression model including both main effects of the covariates. Note that the main effects of the covariates in fact represented their interactions with the GROUP variable, which is different than the linear or generalized linear model frame. The reason is that the GROUP variable is involved in defining the AUC. Tables below present the parameter estimates, SEs, p-values, and 95% CIs for model with one and two covariates. Because parameter $\beta_2$ was not significantly different from 0, we dropped OVITAMIN and fit another model including only the SMOKE main effect.Table below shows a significant interaction between SMOKE and GROUP because the SMOKE was statistically significant (95% CI: (0.05, 1.47)). Therefore, the final model was $$logit(\hat{\pi}_{Smoke}) = \hat{\beta_0} + \hat{\beta_1}*I(Smoke =Yes)$$. Because the interaction between SMOKE and GROUP was significant, we need to use AUC as a measure of the GROUP effect on CHANGE_DRINK for smokers and non-smokers separately using following formula for example for smokers;

$$\hat{\pi}_{Smoke} = \frac{e^{\hat{\beta_0} + \hat{\beta_1}Smoke =Yes}}{1 + e^{\hat{\beta_0} + \hat{\beta_1}Smoke =Yes}} $$ Specifically, the AUCs were 0.537 (insignificant) and 0.713 (significant) for non-smokers and smokers, respectively. This implies that the effect of positive and control brochures were similar for nonsmokers; however, for smokers, the probability that the positive brochure had a better effect than the control brochure in terms of alcohol reduction is 71.30%, indicating the positive brochure is a better option than the control brochure.

Result of sAUC Regression with one discrete covariate

library(sAUC)
library(DT)
library(shiny)

fasd_label <- sAUC::fasd
names(fasd_label) <- c("y", "group", "vitamin", "smoke")
fasd_label[, c("smoke", "vitamin", "group")] <- lapply(fasd_label[, c("smoke", "vitamin", "group")], function(x) factor(x))

result_one <- sAUC::sAUC(formula = y ~ smoke, treatment_group = "group", data = fasd_label)

result_two <- sAUC::sAUC(formula = y ~ smoke + vitamin, treatment_group = "group", data = fasd_label)

# result_one$`Model summary`
DT::datatable(as.data.frame(result_one$"Model summary"),
            caption = htmltools::tags$caption(
              style = "font-size:120%",
              strong('Model results'), '{Note: left-side of model :  ', result_one$"model_formula","}"),
            options = list(pageLength = 6, dom = 'tip'), rownames = TRUE)

Result of sAUC Regression with two discrete covariates

DT::datatable(as.data.frame(result_two$"Model summary"),
            caption = htmltools::tags$caption(
              style = "font-size:120%",
              strong('Model results'), '{Note: left-side of model :  ', result_two$"model_formula","}"),
            options = list(pageLength = 6, dom = 'tip'), rownames = TRUE)
# result_two$`Model summary`