In CJangelo/SPR: Tools to Learn Statistical Programming in R

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

Now we extend the OR to RR to include covariates. We can do this with categorical covariates or continuous covariates. There is an R packge with a function (epitools::probratio) that will help you do this, but I want to do it by hand so we all understand what is actually happening.

Also note that I recently had to compute the adjusted RR for an ordinal regression model. I couldn't use an R library, I had to do it by hand. Once you understand this code below, you'll be able to extend it to ordinal context.

library(SPR)
library(MASS)

N = 1e5 #this should be divisible by however many groups you use!
number.groups <- 2
number.timepoints <- 1
set.seed(9122021)

dat <- data.frame('USUBJID' = rep(paste0('Subject_', formatC(1:N, width = 4, flag = '0')), length.out= N*number.timepoints), 'Group' = rep(paste0('Group_', 1:number.groups), length.out = N*number.timepoints), stringsAsFactors=F)

# Generate a Covariate

# Option 1: Continuous, unbalanced:
# dat$Covariate <- rlnorm(n = N, mean = dat$Group == 'Group_2', sd = 0.25)
# range(dat$Covariate)
# aggregate(Covariate  ~ Group, FUN = function(x) round(mean(x, na.rm = T),2), dat = dat, na.action = na.pass)
# Covariate is unbalanced across groups, could serve as a confound

# Option 2: Categorical, Balanced:
# dat$Z <- sample(x = c(0,1), size = N, replace = T, prob = c(0.5, 0.5))

# Option 3: Categorical, Unbalanced:
dat$Z <- rbinom(n = N, size = 1, prob = 0.4 + 0.4*(dat$Group == 'Group_1'))
table(dat$Z)
xtabs(~ Group + Z, data = dat)


# Create Beta parameters for these design matrix:
X <- model.matrix( ~ Group + Z , data = dat)

# Create Beta
Beta <- matrix(0, nrow = ncol(X), dimnames=list(colnames(X), 'param'))
Beta[] <- c(-0.25, -1, -0.5)

# Matrix multiply:
XB <- X %*% Beta
p <- exp(XB) # prevalence - log link
round(range(p), 2)
#p <- exp(XB)/(1 + exp(XB)) - this is the logistic link

# Dichotomize
dat$Y_binom <- as.vector(1*(p > runif(n = N)))

# Plot
aggregate(Y_binom  ~ Group, FUN = function(x) round(mean(x, na.rm = T),2), dat = dat, na.action = na.pass)
barplot(100*table(dat$Y_binom )/sum(table(dat$Y_binom )), ylim = c(0, 100), ylab = 'Percentage', col = 'grey', main = 'Binomial')


library(ggplot2)
ggplot(data = dat, aes(x= as.factor(Y_binom), fill = Group)) +
      geom_bar(aes(y = (..count..)/sum(..count..)), color="#e9ecef", alpha=0.6, position="dodge", stat="count") +
      scale_y_continuous(labels=scales::percent_format(accuracy = 1)) +
      scale_fill_manual(name = 'Groups', values=c("blue2", "red2")) +
      theme_minimal() +
      theme(legend.position = 'bottom') +
      labs(x = 'Score', y = 'Percentage',
           title = 'Scores with a Binomial Distribution',
           caption = 'Note: Simulated data')

# Observed Data:
xtabs(~ Y_binom + Group, data = dat)
ptab <- prop.table(xtabs(~ Y_binom + Group, data = dat), 2)

Model Fitting - Log Binomial

This models the relative risk directly.

# Compare to model output:
mod <- glm(Y_binom ~ Group, data = dat, family = binomial(link ='log'))

# Adjusted Relative Risk - log binomial model with covariate (confound)
mod2 <- glm(Y_binom ~ Group + Z,
            start = as.vector(Beta),
            data = dat, family = binomial(link ='log'))

Model Fitting - Logistic Regression

#--------------------------------
# Logistic Regression:
mod.lr <- glm(Y_binom ~ Group + Z, data = dat,
              family = binomial(link ='logit'))

Now we have the odds ratio from the model, we can compute the relative risk using the formula we reviewed: OR/((1-P0) + (P0 * OR)). However, this formula requires the baseline prevalence.

Recover adjusted RR

We want the probability of Y given Group membership, averaged over covariates. This is computing the RR marginalizing over the covariates. So this is the RR specific to the sample characteristics. You've selected a base rate specific to the sample.

p1i <- predict(mod.lr, type = 'response',
              newdata = data.frame('Group' = 'Group_2',
                                  'Z' = dat$Z))

p0i <- predict(mod.lr, type = 'response',
              newdata = data.frame('Group' = 'Group_1',
                                   'Z' = dat$Z))

mean(p1i)/mean(p0i)

# This aligns with the generating value:
exp(Beta['GroupGroup_2', ])

Note on computation

https://escholarship.org/content/qt3ng2r0sm/qt3ng2r0sm.pdf

"Standardized measures are constructed by taking averages over C before comparisons (e.g., ratios or differences) across X...Recalling Jensen’s inequality (an average of a nonlinear function does not equal the function applied to the averages), it should not be surprising to find divergences between collapsibility conditions depending on the step at which averaging is done (Samuels, 1981, sec. 3)."

R package `epitools`

library(epitools)
# Function is 'probratio()'
# See documentation - marginalizes over observed covariates
pr <- epitools::probratio(mod.lr, 
                          method='delta', 
                          scale='linear') 
pr

Summary - All the RR

Let's put it all together and see if we can understand

# Generating parameter (true value):
exp(Beta['GroupGroup_2', ]) 

# Observed RR (just using observed proportions):
ptab # Group 1 & Group 2
ptab['1', 'Group_2']/ptab['1', 'Group_1'] 

# RR from unadjusted log-binomial model:
exp(coef(mod)['GroupGroup_2']) 

# RR from Adjusted log-binomial model
exp(coef(mod2)['GroupGroup_2']) 

# RR from adjusted Logistic Regression - Marginalized over covariates 
mean(p1i)/mean(p0i)

# same, but letting `epitools::probratio()` do all the work:
pr