In brad-cannell/bfuncs: This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too

Table of contents:

The immediate results of statistical analyses are rarely fit for general consumption. This vignette walks through strategies and functions I've created to help make my results ready for presentation/dissemination.

library(tidyverse)
library(bfuncs)

data(mtcars)

Creating tables of summary statistics {#stat-tables}

Tables are one of the most popular ways to present information to your audience for a reason. When your goal is to show your readers precise numerical summaries of your data, the exact values that resulted from your analysis are important.

However, the specific details that you may want to present from table to table, or the way in which you want to present them, can vary widely. For example, an initial table of descriptive information about your data may or may not include important subgroups of interest, it may include variables of different types, and for any given type, it may include various different kinds of numerical summaries. Additionally, there are numerous types of inferential models that result in numerical summaries that you may want to present in a table. For these reasons, it is difficult to create a good one-size-fits-all approach to automating the process of making results ready for presentation and dissemination.

Having said that, below I walk through the basic strategy I’ve developed over time for moving closer to automating the processes of putting my results into presentation-ready summary tables.

In general, the flow goes something like this:

Start with a .Rmd file whose output is a word document
Create a table shell
Calculate the statistics of interest
Format the statistics of interest for presentation
Fill in the table shell
Polish up the filled in table shell
Knit to Kable
Polish up the resulting Word document

I’ll walk through each step below…

Start with a .Rmd file whose output is a Word document {#start}

Create a new R markdown document. Set the output format to "Word". Edit the YAML header should look something like this:

---
title: "Table 1. Descriptive Characteristics"
output:
  word_document:
    reference_docx: word_style_template_01.docx
---

This file should only be used to create this one specific table, and most of the code chunks should have the echo=FALSE option set. In other words, the file should only create a single table, and the only code chunk that creates a result that viewable in the output Word document should be the final code chunk that prints your filled-in table shell.

Finally, to limit the amount of polishing you need to do later in the process, you can reference a Word template document in the YAML header, as I've done above. More information about the Word templates is available here:

http://rmarkdown.rstudio.com/articles_docx.html

top

Create a table shell {#make-shell}

table <- tibble(
  variable = "", # Variable names
  class    = "", # Classes for categorical variables
  am_0     = "", # Group 1
  am_1     = ""  # Group 2
)

Of course, you can add more groups (or fewer groups) as needed.
I find it's best to keep the column names easy to work with at this point. We will make them more presentation ready later on.

Creating tables of summary statistics

top

Calculate the statistics of interest {#stats-for-table}

This part may vary quite a bit from table to table. However, every descriptive table should include group sample sizes. The get_grou_n is helpful for calculating the group sample size, and returning it in a presentation-ready format.

# Same as above, add n's to table
table <- tibble(
  variable = "",
  class    = "",
  am_0     = mtcars %>% bfuncs::get_group_n(am == 0),
  am_1     = mtcars %>% bfuncs::get_group_n(am == 1)
)

Calculating formatted descriptive statistics for continuous variables

The descriptive analysis vignette gives examples for calculating common descriptive statistics for continuous variables. For example:

mtcars %>% group_by(am) %>% mean_table(mpg)

Although the statistics we need are there, they aren't in a form that can easily be put into a Word table. To help with that process, I've created the format_table function.

The format_table function is an S3 generic. It currently has methods for formatting the output of the freq_table and mean_table functions. Examples of how to use of format_table will be give here and below.

Example: mean and 95% confidence interval (the default) overall

mtcars %>% mean_table(mpg) %>% format_table()

Note: Changing the digits argument to format_table will change the number of digits displayed, but does not change the underlying rounding of the value. That must be changed in the digits argument to mean_table.

Example: n and mean overall

mtcars %>% mean_table(mpg) %>% format_table(stats = "n and mean")

Example: mean and 95% confidence interval (the default) by subgroup

mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table()

Example: n and mean by subgroup

mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table(stats = "n and mean")

Calculating formatted descriptive statistics for categorical variables

This section is similar to the previous, except we are now analyzing categorical variables.

Example: Overall percent and 95% confidence interval (the default)

mtcars %>% group_by(am) %>% freq_table() %>% format_table()

Example: Overall n and percent

mtcars %>% group_by(am) %>% freq_table() %>% format_table(stats = "n and percent")

Example: Row percent and 95% confidence interval (the default)

mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table()

Example: N and row percent

mtcars %>% group_by(am, cyl) %>% freq_table(output = "all") %>% format_table(stats = "n and percent")

Creating tables of summary statistics

top

Further formatting for presentation {#format-stats}

At this point, the statistics of interest are calculated and the values themselves are formatted. If we are not comparing two or more groups in the current table, we may be able to skip this section and go straight to filling in the table shell.

However, if we are comparing two or more groups, then some additional formatting is needed. Currently, our groups make up the rows of the formatted table. Typically, we want the groups to make up the columns of the formatted table. I've found tidyr::spread() to be really useful for fixing this problem. For example:

mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95)

Where:

Key = the variable whose levels make up our comparison groups of interest
Value = the values we are interested in comparing

We are getting closer to having this in a form that can be used to fill out our table shell. However, we still need to assign the name of the first column (cyl) as a value to a new column named "variable" and we need to change the names of the existing variables to "class", am_0, and am_1 to match our table shell.

mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95) %>% 
  select(-row_var) %>% 
  # Rename to row bind with table shell
  rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`)

There may be better ways to do this, but this is the best I've found so far.

Creating tables of summary statistics

top

Fill in the table shell {#fill-shell}

Again, there are multiple ways to go about completing this step. I will need to build out this section over time. In the current example the most straight forward method is probably to think of each characteristic of interest as a row in the table. We will stack the rows on top of each other the create the full table using dplyr::bind_rows().

First, an example using just one variable:

row <- mtcars %>% 
  group_by(am, cyl) %>% 
  freq_table() %>% 
  format_table() %>% 
  spread(key = row_cat, value = percent_row_95) %>% 
  select(-row_var) %>% 
  rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
  mutate(class = as.character(class)) # Need for bind_rows below

bind_rows(table, row)

Then, it's trivial to extend this method to multiple variables using a for loop.

# Select variables 
cat_vars <- quos(cyl, vs)

# Execute freq_table and row bind results to table
for (i in seq_along(cat_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am, !!cat_vars[[i]]) %>% 
    freq_table() %>% 
    format_table() %>% 
    spread(key = row_cat, value = percent_row_95) %>% 
    select(-row_var) %>% 
    rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
    mutate(class = as.character(class))

  # Append to bottom of table
  table <- bind_rows(table, row)
}

print(table)

Still thinking about potentially more elegant ways to go about the filling in the table shell step. It's challenging because there are just so many different variations on the state of the data at the beginning of this step and the format of the table I want to result from this step.
An additional drawback to this method is that intermingling continuous and categorical variables requires multiple for loops. This is probably something worth addressing in the future.

Example: Continuous and categorical variables

# Reset the table shell
# ---------------------
table <- tibble(
  variable = "",
  class    = "",
  am_0     = mtcars %>% bfuncs::get_group_n(am == 0),
  am_1     = mtcars %>% bfuncs::get_group_n(am == 1)
)


# Select variables
# ----------------
cont_vars <- quos(mpg, disp)
cat_vars  <- quos(cyl, vs)


# Fill in continuous variables
# ----------------------------
for (i in seq_along(cont_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am) %>% 
    bfuncs::mean_table(!!cont_vars[[i]]) %>% 
    bfuncs::format_table() %>% 
    spread(key = group_cat, value = mean_95) %>% 
    select(-group_var) %>% 
    rename("variable" = response_var, "am_0" = `0`, "am_1" = `1`)

  # Append to bottom of table
  table <- bind_rows(table, row)
}


# Fill in categorical variables
# -----------------------------
for (i in seq_along(cat_vars)) {

  # Calculate mean and 95% CI
  row <- mtcars %>% 
    group_by(am, !!cat_vars[[i]]) %>% 
    freq_table() %>% 
    format_table() %>% 
    spread(key = row_cat, value = percent_row_95) %>% 
    select(-row_var) %>% 
    rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% 
    mutate(class = as.character(class))

  # Append to bottom of table
  table <- bind_rows(table, row)
}

print(table)

Again, this still feels clunky.

Creating tables of summary statistics

top

& nbsp;

Polish up the filled in table shell {#polish-shell}

At this point, we basically have the table. Now, we just need to do a couple more things before knitting our Word document.

Improve the row headers
Remove duplicate variable names for categorical variables
Slide classes to the left, under variable names

Example: Improve row headers

table <- table %>%
  mutate(
    variable = if_else(variable == "mpg", "Miles per gallon, mean (95% CI)", variable),
    variable = if_else(variable == "disp", "Displacement, mean (95% CI)", variable),
    variable = if_else(variable == "cyl", "Number of cylinders, percent (95% CI)", variable),
    variable = if_else(variable == "vs", "V/S, percent (95% CI)", variable)
  ) %>% 
  print

Example: Remove duplicate variable names for categorical variables

table <- table %>%
  group_by(variable) %>%
  mutate(
    x = duplicated(variable),
    x = if_else(variable == "", NA, x)
  ) %>%
  ungroup() %>%
  mutate(
    variable = if_else(x == TRUE, "", variable),
    variable = if_else(is.na(variable), "", variable),
    x = NULL
  ) %>% 
  print

This point, we may be content with the formatting. However, we may also want to slide the class values under the variable values and drop the class column.

Example: Slide classes to the left, under variable names

Add tabs in front of classes
For some reason, R automatically strips the leading white space.
The best work around I can come up with is to add dashes, then find and replaces dashes with white space in Word.

# table %>%
#   mutate(
#     class = stringr::str_replace(class, "^", "---"),
#     variable = if_else(variable == "", class, variable),
#     class = NULL
#   )

Not working. Need to figure out how I did this before.

Creating tables of summary statistics

top

Knit to Kable {#kable}

table_kable <- knitr::kable(table, col.names = c(
  "Characteristic",
  "Class",
  "Automatic Transmission", 
  "manual Transmission")
)

print(table_kable)

Get rid of class NA.

Creating tables of summary statistics

top

Polish up the resulting Word document {#polish_word}

Typically do some stuff like:

Find and replace --- with white space
Reorient to landscape
Remove bold from title - except "Table 1"
Center columns 3 and above from top to bottom
Adjust column widths as needed
Add bottom border to table

There may be packages now that con do some of this programatically. Look at officer and wordr.

Creating tables of summary statistics

top

brad-cannell/bfuncs documentation built on July 21, 2019, 10:45 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

brad-cannell/bfuncs
This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too

In brad-cannell/bfuncs: This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too

Creating tables of summary statistics {#stat-tables}

Start with a .Rmd file whose output is a Word document {#start}

Create a table shell {#make-shell}

Calculate the statistics of interest {#stats-for-table}

Calculating formatted descriptive statistics for continuous variables

Example: mean and 95% confidence interval (the default) overall

Example: n and mean overall

Example: mean and 95% confidence interval (the default) by subgroup

Example: n and mean by subgroup

Calculating formatted descriptive statistics for categorical variables

Example: Overall percent and 95% confidence interval (the default)

Example: Overall n and percent

Example: Row percent and 95% confidence interval (the default)

Example: N and row percent

Further formatting for presentation {#format-stats}

Fill in the table shell {#fill-shell}

Example: Continuous and categorical variables

Polish up the filled in table shell {#polish-shell}

Example: Improve row headers

Example: Remove duplicate variable names for categorical variables

Example: Slide classes to the left, under variable names

Knit to Kable {#kable}

Polish up the resulting Word document {#polish_word}

R Package Documentation

Browse R Packages

We want your feedback!

brad-cannell/bfuncs This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too

In brad-cannell/bfuncs: This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too

Creating tables of summary statistics {#stat-tables}

Start with a .Rmd file whose output is a Word document {#start}

Create a table shell {#make-shell}

Calculate the statistics of interest {#stats-for-table}

Calculating formatted descriptive statistics for continuous variables

Example: mean and 95% confidence interval (the default) overall

Example: n and mean overall

Example: mean and 95% confidence interval (the default) by subgroup

Example: n and mean by subgroup

Calculating formatted descriptive statistics for categorical variables

Example: Overall percent and 95% confidence interval (the default)

Example: Overall n and percent

Example: Row percent and 95% confidence interval (the default)

Example: N and row percent

Further formatting for presentation {#format-stats}

Fill in the table shell {#fill-shell}

Example: Continuous and categorical variables

Polish up the filled in table shell {#polish-shell}

Example: Improve row headers

Example: Remove duplicate variable names for categorical variables

Example: Slide classes to the left, under variable names

Knit to Kable {#kable}

Polish up the resulting Word document {#polish_word}

R Package Documentation

Browse R Packages

We want your feedback!

brad-cannell/bfuncs
This Repository Is A Random Smattering Of Functions I Wrote For Myself - You Can Use Them Too