Table of contents:
Creating tables of summary statistics
The immediate results of statistical analyses are rarely fit for general consumption. This vignette walks through strategies and functions I've created to help make my results ready for presentation/dissemination.
library(tidyverse) library(bfuncs)
data(mtcars)
Tables are one of the most popular ways to present information to your audience for a reason. When your goal is to show your readers precise numerical summaries of your data, the exact values that resulted from your analysis are important.
However, the specific details that you may want to present from table to table, or the way in which you want to present them, can vary widely. For example, an initial table of descriptive information about your data may or may not include important subgroups of interest, it may include variables of different types, and for any given type, it may include various different kinds of numerical summaries. Additionally, there are numerous types of inferential models that result in numerical summaries that you may want to present in a table. For these reasons, it is difficult to create a good one-size-fits-all approach to automating the process of making results ready for presentation and dissemination.
Having said that, below I walk through the basic strategy I’ve developed over time for moving closer to automating the processes of putting my results into presentation-ready summary tables.
In general, the flow goes something like this:
I’ll walk through each step below…
Create a new R markdown document. Set the output format to "Word". Edit the YAML header should look something like this:
--- title: "Table 1. Descriptive Characteristics" output: word_document: reference_docx: word_style_template_01.docx ---
This file should only be used to create this one specific table, and most of the code chunks should have the echo=FALSE option set. In other words, the file should only create a single table, and the only code chunk that creates a result that viewable in the output Word document should be the final code chunk that prints your filled-in table shell.
Finally, to limit the amount of polishing you need to do later in the process, you can reference a Word template document in the YAML header, as I've done above. More information about the Word templates is available here:
http://rmarkdown.rstudio.com/articles_docx.html
table <- tibble( variable = "", # Variable names class = "", # Classes for categorical variables am_0 = "", # Group 1 am_1 = "" # Group 2 )
Of course, you can add more groups (or fewer groups) as needed.
I find it's best to keep the column names easy to work with at this point. We will make them more presentation ready later on.
Creating tables of summary statistics
This part may vary quite a bit from table to table. However, every descriptive table should include group sample sizes. The get_grou_n
is helpful for calculating the group sample size, and returning it in a presentation-ready format.
# Same as above, add n's to table table <- tibble( variable = "", class = "", am_0 = mtcars %>% bfuncs::get_group_n(am == 0), am_1 = mtcars %>% bfuncs::get_group_n(am == 1) )
The descriptive analysis vignette gives examples for calculating common descriptive statistics for continuous variables. For example:
mtcars %>% group_by(am) %>% mean_table(mpg)
Although the statistics we need are there, they aren't in a form that can easily be put into a Word table. To help with that process, I've created the format_table
function.
The format_table
function is an S3 generic. It currently has methods for formatting the output of the freq_table
and mean_table
functions. Examples of how to use of format_table
will be give here and below.
mtcars %>% mean_table(mpg) %>% format_table()
format_table
will change the number of digits displayed, but does not change the underlying rounding of the value. That must be changed in the digits argument to mean_table
.mtcars %>% mean_table(mpg) %>% format_table(stats = "n and mean")
mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table()
mtcars %>% group_by(am) %>% mean_table(mpg) %>% format_table(stats = "n and mean")
This section is similar to the previous, except we are now analyzing categorical variables.
mtcars %>% group_by(am) %>% freq_table() %>% format_table()
mtcars %>% group_by(am) %>% freq_table() %>% format_table(stats = "n and percent")
mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table()
mtcars %>% group_by(am, cyl) %>% freq_table(output = "all") %>% format_table(stats = "n and percent")
Creating tables of summary statistics
At this point, the statistics of interest are calculated and the values themselves are formatted. If we are not comparing two or more groups in the current table, we may be able to skip this section and go straight to filling in the table shell.
However, if we are comparing two or more groups, then some additional formatting is needed. Currently, our groups make up the rows of the formatted table. Typically, we want the groups to make up the columns of the formatted table. I've found tidyr::spread()
to be really useful for fixing this problem. For example:
mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table() %>% spread(key = row_cat, value = percent_row_95)
Where:
Key = the variable whose levels make up our comparison groups of interest
Value = the values we are interested in comparing
We are getting closer to having this in a form that can be used to fill out our table shell. However, we still need to assign the name of the first column (cyl) as a value to a new column named "variable" and we need to change the names of the existing variables to "class", am_0, and am_1 to match our table shell.
mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table() %>% spread(key = row_cat, value = percent_row_95) %>% select(-row_var) %>% # Rename to row bind with table shell rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`)
There may be better ways to do this, but this is the best I've found so far.
Creating tables of summary statistics
Again, there are multiple ways to go about completing this step. I will need to build out this section over time. In the current example the most straight forward method is probably to think of each characteristic of interest as a row in the table. We will stack the rows on top of each other the create the full table using dplyr::bind_rows()
.
First, an example using just one variable:
row <- mtcars %>% group_by(am, cyl) %>% freq_table() %>% format_table() %>% spread(key = row_cat, value = percent_row_95) %>% select(-row_var) %>% rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% mutate(class = as.character(class)) # Need for bind_rows below bind_rows(table, row)
Then, it's trivial to extend this method to multiple variables using a for loop.
# Select variables cat_vars <- quos(cyl, vs) # Execute freq_table and row bind results to table for (i in seq_along(cat_vars)) { # Calculate mean and 95% CI row <- mtcars %>% group_by(am, !!cat_vars[[i]]) %>% freq_table() %>% format_table() %>% spread(key = row_cat, value = percent_row_95) %>% select(-row_var) %>% rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% mutate(class = as.character(class)) # Append to bottom of table table <- bind_rows(table, row) } print(table)
Still thinking about potentially more elegant ways to go about the filling in the table shell step. It's challenging because there are just so many different variations on the state of the data at the beginning of this step and the format of the table I want to result from this step.
An additional drawback to this method is that intermingling continuous and categorical variables requires multiple for loops. This is probably something worth addressing in the future.
# Reset the table shell # --------------------- table <- tibble( variable = "", class = "", am_0 = mtcars %>% bfuncs::get_group_n(am == 0), am_1 = mtcars %>% bfuncs::get_group_n(am == 1) ) # Select variables # ---------------- cont_vars <- quos(mpg, disp) cat_vars <- quos(cyl, vs) # Fill in continuous variables # ---------------------------- for (i in seq_along(cont_vars)) { # Calculate mean and 95% CI row <- mtcars %>% group_by(am) %>% bfuncs::mean_table(!!cont_vars[[i]]) %>% bfuncs::format_table() %>% spread(key = group_cat, value = mean_95) %>% select(-group_var) %>% rename("variable" = response_var, "am_0" = `0`, "am_1" = `1`) # Append to bottom of table table <- bind_rows(table, row) } # Fill in categorical variables # ----------------------------- for (i in seq_along(cat_vars)) { # Calculate mean and 95% CI row <- mtcars %>% group_by(am, !!cat_vars[[i]]) %>% freq_table() %>% format_table() %>% spread(key = row_cat, value = percent_row_95) %>% select(-row_var) %>% rename(variable = col_var, class = col_cat, "am_0" = `0`, "am_1" = `1`) %>% mutate(class = as.character(class)) # Append to bottom of table table <- bind_rows(table, row) } print(table)
Creating tables of summary statistics
& nbsp;
At this point, we basically have the table. Now, we just need to do a couple more things before knitting our Word document.
Improve the row headers
Remove duplicate variable names for categorical variables
Slide classes to the left, under variable names
table <- table %>% mutate( variable = if_else(variable == "mpg", "Miles per gallon, mean (95% CI)", variable), variable = if_else(variable == "disp", "Displacement, mean (95% CI)", variable), variable = if_else(variable == "cyl", "Number of cylinders, percent (95% CI)", variable), variable = if_else(variable == "vs", "V/S, percent (95% CI)", variable) ) %>% print
table <- table %>% group_by(variable) %>% mutate( x = duplicated(variable), x = if_else(variable == "", NA, x) ) %>% ungroup() %>% mutate( variable = if_else(x == TRUE, "", variable), variable = if_else(is.na(variable), "", variable), x = NULL ) %>% print
Add tabs in front of classes
For some reason, R automatically strips the leading white space.
The best work around I can come up with is to add dashes, then find and replaces dashes with white space in Word.
# table %>% # mutate( # class = stringr::str_replace(class, "^", "---"), # variable = if_else(variable == "", class, variable), # class = NULL # )
Creating tables of summary statistics
table_kable <- knitr::kable(table, col.names = c( "Characteristic", "Class", "Automatic Transmission", "manual Transmission") ) print(table_kable)
Creating tables of summary statistics
Typically do some stuff like:
There may be packages now that con do some of this programatically. Look at officer and wordr.
Creating tables of summary statistics
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.