knitr::opts_chunk$set(echo = TRUE)

Summarise variables/factors by a categorical variable

summary.factorlist() is a simple wrapper used to summarise any number of variables by a single categorical variable. This is usually "Table 1" of a study report.

library(summarizer)
library(dplyr)
library(stringr)

# Load example dataset, modified version of survival::colon
data(colon_s)

# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
  summary.factorlist(dependent, explanatory, p=T)

summary.factorlist() is also commonly used to summarise any number of variables by an outcome variable (say dead yes/no).

# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  summary.factorlist(dependent, explanatory)

Summarise regression model results in final table format

The second main feature is the ability to create final tables for logistic glm(), hierarchical logistic lme4::glmer() and Cox proprotional hazard survival::coxph() regression models.

The summarizer() "all-in-one" function takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a final table for publication including summary statistics, univariable and multivariable regression analyses. The first columns are those produced by summary.factorist().

glm

glm(depdendent ~ explanatory, family="binomial")

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory)

multi-level

Where a multivariable model contains a subset of the variables specified in the full univariable set, this can be specified.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi)

Random effects.

lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi, random.effect)

with metrics

metrics=TRUE provides common model metrics. note - defaults to data.frame print out - kable doesn't handle list automatically

colon_s %>%
  summarizer(dependent, explanatory, explanatory.multi,  metrics=TRUE)

Cox proportional hazards

survival::coxph(dependent ~ explanatory)

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"

colon_s %>% 
    summarizer(dependent, explanatory)

Subsets

Rather than going all-in-one, any number of subset models can be manually added on to a summary.factorlist() table using summarizer.merge(). This is particularly useful when models take a long-time to run or are complicated.

glm

Note requirement for glm.id=TRUE. fit2df is a subfunction extracting most common models to a dataframe.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'

# Separate tables
colon_s %>%
  summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example.summary

colon_s %>%
  glmuni(dependent, explanatory) %>%
  fit2df(estimate.suffix=" (univariable)") -> example.univariable

colon_s %>%
  glmmulti(dependent, explanatory) %>%
  fit2df(estimate.suffix=" (multivariable)") -> example.multivariable


colon_s %>%
  glmmixed(dependent, explanatory, random.effect) %>%
  fit2df(estimate.suffix=" (multilevel") -> example.multilevel

# Pipe together
example.summary %>% 
  summarizer.merge(example.univariable) %>% 
  summarizer.merge(example.multivariable) %>% 
  summarizer.merge(example.multilevel) %>% 
  select(-c(glm.id, index)) -> example.final
example.final

Cox Proportional Hazards example with separate tables merged together.

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = "Surv(time, status)"

# Separate tables
colon_s %>%
    summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example2.summary

colon_s %>%
    coxphuni(dependent, explanatory) %>%
    fit2df(estimate.suffix=" (univariable)") -> example2.univariable

colon_s %>%
  coxphmulti(dependent, explanatory.multi) %>%
  fit2df(estimate.suffix=" (multivariable)") -> example2.multivariable

# Pipe together
example2.summary %>% 
    summarizer.merge(example2.univariable) %>% 
    summarizer.merge(example2.multivariable) %>% 
    select(-c(glm.id, index)) -> example2.final
example2.final

Summarise regression model results in plot

Models can be summarized with odds ratio/hazard ratio plots using or.plot or hr.plot (hr.plot not fully tested).

# OR plot
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  or.plot(dependent, explanatory)
# Previously fitted models (`glmmulti`) can be provided directly to `glmfit`  

# HR plot (not fully tested)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
  hr.plot(dependent, explanatory, dependent_label = "Survival")
# Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`

Our own particular Rstan models are supported and will be documented in the future. Broadly, if you are running (hierarchical) logistic regression models in Stan with coefficients specified as a vector labelled beta, then fit2df() will work directly on the stanfit object in a similar manner to if it was a glm or glmerMod object.

Notes

Use Hmisc::label() to assign labels to variables for tables and plots.

label(colon_s$age.factor) = "Age (years)"

Export dataframe tables directly or to R Markdown using knitr::kable().

Note wrapper summary.missing() can be useful. Wraps mice::md.pattern.

colon_s %>%
  summary.missing(dependent, explanatory)


ewenharrison/summarizer documentation built on May 16, 2019, 9:41 a.m.