Introduction to ezsummary 0.2.0

When we do a typical statistical summary to a piece of data, we usually:

This ezsummary package allows you to:

This package is not intent to solve every single summarization problem. The goal is to simplify and speed up 80% of the most common data summarization tasks. For the rest 20%, one can always use dplyr, tidyr or other tools to get what they want.

This package builds heavily on Hadley's dplyr and tidyr. If you are not familar with neither these two, you may want to read the package vignettes for at least dplyr first before you continue.

Sample Data: mtcars

We will use mtcars to demonstrate the functionality of this package as it provides a good amount of both continuous and categorical data and almost everyone is familar with it.

dim(mtcars)
head(mtcars)

Functions available in ezsummary

Here is a list of functions available in ezsummary:

Note: I picked this order as it's easier to explain in this way. __If you want a quick jump start, you can go to the ezsummary_q() and ezsummary() & var_type()

ezmarkup()

I will start with ezmarkup() as some functionalities of other ezsummary functions depend on it. This function can combine two or multiple columns and format the result in a customized way. In some way, it is similar with tidyr::unite() but it provides a more flexible way to format the result (but I admit it's not that well written :P).

In ezmarkup(), we use a dot to indicate a column. If you want to combine two columns, you put them in a pair of [], like [..]. The interesting part is, inside the brackets, you can literally do whatever you want. For example, [. (.)] will put the second column in a pair of () sitting one space after the first column.

library(dplyr)
library(ezsummary)
library(knitr)

mtcars %>% 
  select(1:3) %>%
  ezmarkup(".[. (.)]") %>%
  head()

mtcars %>% 
  select(1:3) %>%
  ezmarkup(".[. ~~.~~ :-)]") %>%
  head()

ezsummary_q()

Preset functions

Let's get back to data summarization. The most common tasks for quantitative analyses have been pre-programmed and you can just use those options to decide whether you want to include them in the analysis. Such pre-programmed options include:

By default, mean and sd are turned on as they are commonly used.

mtcars %>% ezsummary(n = T, quantile = T) %>% kable()

Customized Functions

If you don't see what you want in this list, you can also program some functions on your own by defining them in the option extra. Multiple extra functions can be piped in as a vector. The name of the vector element is the label for the result column. The functions are wrapped as strings with the variable indicated by the dot. For example, if you want to get the maximum value and counts of records larger than 20, you can use the code below

mtcars %>% 
  ezsummary(
    extra = c(
      max = "max(., na.rm = T)",
      above20 = "sum(. > 20, na.rm = T)"
    )
  ) %>%
  kable()

Summarizing by group

In many cases, we usually need to summarize two or more groups of data. In that case, instead of subsetting, you can use dplyr::group_by() together with ezsummary(), ezsummary_q() and ezsummary_c().

mtcars %>%
  group_by(cyl) %>%
  ezsummary(digits = 1) %>%
  kable()

"Wide" format

If you don't want the categorical info be listed out separately as a column, you can use the flavor option (either "long" or "wide"). It will call tidyr::gather() and tidyr::spread() internally and resort columns in an order you would expect (unlike the default alphabetical sorting behavior of tidyr::spread()).

mtcars %>%
  group_by(cyl) %>%
  ezsummary(flavor = "wide", digits = 1) %>% 
  kable()

Unit Markup

You can also ask ezsummary() to call ezmarkup() internally to combine columns to make "Table One" style tables. Here, since we assume you don't need to know how many groups there are when you first run ezsummary, we use an option called unit_markup to mark the styles you want for each group.

mtcars %>%
  group_by(carb) %>%
  ezsummary(flavor = "wide", digits = 1, unit_markup = '[. (.)]') %>%
  kable()

Rounding Methods

As I demonstrated above, you can use digits to control the rounding digits. In fact, in ezsummary, you can even control rounding method by adjusting the rounding method option. Available methods are "round"(default), "signif", "ceiling" and "floor". You can check ?round in R for details.

mtcars %>%
  ezsummary(rounding_type = "ceiling") %>%
  kable()

ezsummary_c()

ezsummary_c() is for categorical summarization. Comparing with ezsummary_q(), it is very straight forward. It can take most of the options that `ezsummary_q() takes. You can customize if you want a "decimal" or "percent" output.

mtcars %>%
  select(cyl, vs, am, gear, carb) %>%
  ezsummary_c() %>%
  kable()
mtcars %>%
  group_by(cyl) %>%
  select(cyl, vs, am, gear, carb) %>%
  ezsummary_c(p_type = "percent", flavor = "wide", 
              unit_markup = "[. (.)]", digits = 0) %>%
  kable()

ezsummary() & var_type()

You might have already found that in the "ezsummary_q" section, I actually used ezsummary() instead of ezsummary_q(). Basically, ezsummary() is a wrapper function for both ezsummary_q() and ezsummary_c(). It automatically categorizes the options you passed in. It assumes all variables are continuous unless they are character strings. This function exists as an attempt to unify the analytic results of continuous and categorical variables into one table. In order to specify which variables you want to analyze as categorical variables, you need to specify them via var_types(), which takes a string of either "q" or "c" for each variable to be analyzed.

mtcars %>%
  select(mpg, cyl, disp, gear) %>%
  var_types("qcqc") %>%
  ezsummary(n = T) %>%
  kable()
mtcars %>%
  select(mpg, cyl, disp, gear) %>%
  var_types("qcqc") %>%
  group_by(cyl) %>%
  ezsummary(flavor = "wide", unit_markup = "[. (.)]", 
            p_type = "percent", digits = 1) %>%
  kable(col.names = c("", "4 Cylinders", "6 Cylinders", "8 Cylinder"))


Try the ezsummary package in your browser

Any scripts or data that you put into this service are public.

ezsummary documentation built on May 29, 2017, 1:46 p.m.