Table of contents:

Creating tables of summary statistics

 


The immediate results of statistical analyses are rarely fit for general consumption. This vignette walks through strategies and functions I've created to help make my results ready for presentation/dissemination.

library(tidyverse)
library(bfuncs)
data(mtcars)

 


Creating tables of summary statistics {#stat-tables}

Tables are one of the most popular ways to present information to your audience for a reason. When your goal is to show your readers precise numerical summaries of your data, the exact values that resulted from your analysis are important.

However, the specific details that you may want to present from table to table, or the way in which you want to present them, can vary widely. For example, an initial table of descriptive information about your data may or may not include important subgroups of interest, it may include variables of different types, and for any given type, it may include various different kinds of numerical summaries. Additionally, there are numerous types of inferential models that result in numerical summaries that you may want to present in a table. For these reasons, it is difficult to create a good one-size-fits-all approach to automating the process of making results ready for presentation and dissemination.

Having said that, below I walk through the basic strategy I’ve developed over time for moving closer to automating the processes of putting my results into presentation-ready summary tables.

In general, the flow goes something like this:

  1. Start with a .Rmd file whose output is a word document

  2. Create a table shell

  3. Calculate the statistics of interest

  4. Format the statistics of interest for presentation

  5. Fill in the table shell

  6. Polish up the filled in table shell

  7. Knit to Kable

  8. Polish up the resulting Word document

I’ll walk through each step below…

 


Start with a .Rmd file whose output is a Word document {#start}

Create a new R markdown document. Set the output format to "Word". Edit the YAML header should look something like this:

---
title: "Table 1. Descriptive Characteristics"
output:
  word_document:
    reference_docx: word_style_template_01.docx
---

This file should only be used to create this one specific table, and most of the code chunks should have the echo=FALSE option set. In other words, the file should only create a single table, and the only code chunk that creates a result that viewable in the output Word document should be the final code chunk that prints your filled-in table shell.

Finally, to limit the amount of polishing you need to do later in the process, you can reference a Word template document in the YAML header, as I've done above. More information about the Word templates is available here:

http://rmarkdown.rstudio.com/articles_docx.html

top

 


Create a table shell {#make-shell}

table <- tibble(
  variable = "", # Variable names
  class    = "", # Classes for categorical variables
  am_0     = "", # Group 1
  am_1     = ""  # Group 2
)

Creating tables of summary statistics

top

 


Calculate the statistics of interest {#stats-for-table}

This part may vary quite a bit from table to table. However, every descriptive table should include group sample sizes. The get_grou_n is helpful for calculating the group sample size, and returning it in a presentation-ready format.

# Same as above, add n's to table
table <- tibble(
  variable = "",
  class    = "",
  am_0     = mtcars %>% bfuncs::get_group_n(am == 0),
  am_1     = mtcars %>% bfuncs::get_group_n(am == 1)
)

Calculating formatted descriptive statistics for continuous variables

The descriptive analysis vignette gives examples for calculating common descriptive statistics for continuous variables. For example:

mtcars %>% group_by(am) %>% mean_table(mpg)

Although the statistics we need are there, they aren't in a form that can easily be put into a Word table. To help with that process, I've created the format_table function.

The format_table function is an S3 generic. It currently has methods for formatting the output of the freq_table and mean_table functions. Examples of how to use of format_table will be give here and below.

Example: mean and 95% confidence interval (the default) overall

mtcars %>% mean_table(mpg) %>% format_table()

Example: n and mean overall

mtcars %>% mean_table(mpg) %>% format_table(stats = "n and mean")

Example: mean and 95% confidence interval (the default) by subgroup

mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table()

Example: n and mean by subgroup

mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table(stats = "n and mean")

 


Calculating formatted descriptive statistics for categorical variables

This section is similar to the previous, except we are now analyzing categorical variables.

Example: Overall percent and 95% confidence interval (the default)

mtcars %>% group_by(am) %>% freq_table() %>% format_table()

Example: Overall n and percent

mtcars %>% group_by(am) %>% freq_table() %>% format_table(stats = "n and percent")

Example: Row percent and 95% confidence interval (the default)

mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table()

Example: N and row percent

mtcars %>% group_by(am, cyl) %>% freq_table(output = "all") %>% format_table(stats = "n and percent")

Creating tables of summary statistics

top

 


Further formatting for presentation {#format-stats}

At this point, the statistics of interest are calculated and the values themselves are formatted. If we are not comparing two or more groups in the current table, we may be able to skip this section and go straight to filling in the table shell.

However, if we are comparing two or more groups, then some additional formatting is needed. Currently, our groups make up the rows of the formatted table. Typically, we want the groups to make up the columns of the formatted table. I've found tidyr::spread() to be really useful for fixing this problem. For example:

mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95)

Where:

We are getting closer to having this in a form that can be used to fill out our table shell. However, we still need to assign the name of the first column (cyl) as a value to a new column named "variable" and we need to change the names of the existing variables to "class", am_0, and am_1 to match our table shell.

mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95) %>% 
  select(-row_var) %>% 
  # Rename to row bind with table shell
  rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) 

There may be better ways to do this, but this is the best I've found so far.

Creating tables of summary statistics

top

 


Fill in the table shell {#fill-shell}

Again, there are multiple ways to go about completing this step. I will need to build out this section over time. In the current example the most straight forward method is probably to think of each characteristic of interest as a row in the table. We will stack the rows on top of each other the create the full table using dplyr::bind_rows().

First, an example using just one variable:

row <- mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95) %>% 
  select(-row_var) %>% 
  rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
  mutate(class = as.character(class)) # Need for bind_rows below

bind_rows(table, row)

Then, it's trivial to extend this method to multiple variables using a for loop.

# Select variables 
cat_vars <- quos(cyl, vs)

# Execute freq_table and row bind results to table
for (i in seq_along(cat_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am, !!cat_vars[[i]]) %>% 
    freq_table() %>% 
    format_table() %>% 
    spread(key = row_cat, value = percent_row_95) %>% 
    select(-row_var) %>% 
    rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
    mutate(class = as.character(class))

  # Append to bottom of table
  table <- bind_rows(table, row)
}

print(table)

Example: Continuous and categorical variables

# Reset the table shell
# ---------------------
table <- tibble(
  variable = "",
  class    = "",
  am_0     = mtcars %>% bfuncs::get_group_n(am == 0),
  am_1     = mtcars %>% bfuncs::get_group_n(am == 1)
)


# Select variables
# ----------------
cont_vars <- quos(mpg, disp)
cat_vars  <- quos(cyl, vs)


# Fill in continuous variables
# ----------------------------
for (i in seq_along(cont_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am) %>% 
    bfuncs::mean_table(!!cont_vars[[i]]) %>% 
    bfuncs::format_table() %>% 
    spread(key = group_cat, value = mean_95) %>% 
    select(-group_var) %>% 
    rename("variable" = response_var, "am_0" = `0`, "am_1" = `1`)

  # Append to bottom of table
  table <- bind_rows(table, row)
}


# Fill in categorical variables
# -----------------------------
for (i in seq_along(cat_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am, !!cat_vars[[i]]) %>% 
    freq_table() %>% 
    format_table() %>% 
    spread(key = row_cat, value = percent_row_95) %>% 
    select(-row_var) %>% 
    rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
    mutate(class = as.character(class))

  # Append to bottom of table
  table <- bind_rows(table, row)
}

print(table)

Creating tables of summary statistics

top

& nbsp;


Polish up the filled in table shell {#polish-shell}

At this point, we basically have the table. Now, we just need to do a couple more things before knitting our Word document.

Example: Improve row headers

table <- table %>%
  mutate(
    variable = if_else(variable == "mpg", "Miles per gallon, mean (95% CI)", variable),
    variable = if_else(variable == "disp", "Displacement, mean (95% CI)", variable),
    variable = if_else(variable == "cyl", "Number of cylinders, percent (95% CI)", variable),
    variable = if_else(variable == "vs", "V/S, percent (95% CI)", variable)
  ) %>% 
  print

Example: Remove duplicate variable names for categorical variables

table <- table %>%
  group_by(variable) %>%
  mutate(
    x = duplicated(variable),
    x = if_else(variable == "", NA, x)
  ) %>%
  ungroup() %>%
  mutate(
    variable = if_else(x == TRUE, "", variable),
    variable = if_else(is.na(variable), "", variable),
    x = NULL
  ) %>% 
  print

Example: Slide classes to the left, under variable names

# table %>%
#   mutate(
#     class = stringr::str_replace(class, "^", "---"),
#     variable = if_else(variable == "", class, variable),
#     class = NULL
#   )

Creating tables of summary statistics

top

 


Knit to Kable {#kable}

table_kable <- knitr::kable(table, col.names = c(
  "Characteristic",
  "Class",
  "Automatic Transmission", 
  "manual Transmission")
)

print(table_kable)

Creating tables of summary statistics

top

 


Polish up the resulting Word document {#polish_word}

Typically do some stuff like:

There may be packages now that con do some of this programatically. Look at officer and wordr.

Creating tables of summary statistics

top

 



brad-cannell/bfuncs documentation built on July 21, 2019, 10:45 a.m.