knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
DescrTab2
is the replacement of the DescrTab
package.
It supports a variety of different customization options and can be used
in .Rmd files in conjunction with knitr.
DescrTab2
works in your R-console, as well as in .Rmd
documents corresponding to
output formats of the type pdf_documument
, html_document
and word_document
.
It even supports YAML-headers with multiple output formats!
For example, if your YAML-header looks like the example below, DescrTab2
should automagically detect the output format
depending on the rendering option you choose from the dropdown menue (the arrow next to the "Knit" button on the top menue bar).
--- title: "DescrTab2 tutorial" output: word_document: default pdf_document: default html_document: default ---
Required LaTeX packages should be loaded automatically as well when rendering as a pdf.
Make sure you include the DescrTab2 library by typing
library(DescrTab2)
somewhere in the document before you use it. You are now ready to go!
We will use two tidyverse
libraries for data manipulation and a the following
dataset for instructive purposes:
library(dplyr, warn.conflicts = FALSE) library(forcats) set.seed(123) dat <- iris[, c("Species", "Sepal.Length")] %>% mutate(animal = c("Mammal", "Fish") %>% rep(75) %>% factor()) %>% mutate(food = c("fries", "wedges") %>% sample(150, TRUE) %>% factor()) head(dat)
Producing beautiful descriptive tables is now as easy as typing:
descr(dat)
The object returned from the descr
function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by descr
:
my_table <- descr(dat)
You can then access the elements of the list using the $
operator.
my_table$variables$Sepal.Length$results$Total$mean
Rstudios autocomplete suggestions are very helpful when navigating this list.
The print
function returns a formatted version of this list, which you can also save and access using the same syntax.
my_table <- descr(dat) %>% print(silent=TRUE)
Use the group
option to specify the name of a grouping variable in your data:
descr(dat, "Species")
Use the group_labels
option to assign group labels and the var_labels
option to assign variable labels:
descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))
Use the caption
member of the format_options
argument to assign a table caption:
descr(dat, "Species", format_options = list(caption="Description of our example dataset."))
For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:
descr(dat, "animal")
There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/b_test_choice_tree_pdf.pdf
Here are some different tests in action:
descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))
In situations with paired data, the group
variable usually denotes the timing of the measurement (e.g. "before" and "after" or "time 1", "time 2", etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The test_options =list(paired=TRUE, indices = <Character name of index variable name or vector of indices>)
option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in "long format", see e.g. ?reshape
or ?tidyr::pivot_longer
for information on how to transoform your data from wide to long format.
descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2))) descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" ))
Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:
descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )
Let's say you don't want to calculate quantiles for your numeric variables. You can specify the summary_stats_cont
option to include all summary statistics but quantiles:
descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean = DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max = DescrTab2:::.max))
Let's say you have a categorical variable, but for some reason it's levels are numerals and you want to calculate the mean. No problem:
# Create example dataset dat2 <- iris dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor() dat2 <- dat2[, c("Species", "cat_var")] descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))
Use the format_options = list(combine_mean_sd=TRUE)
option:
descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))
You can declare the format_options = list(print_p = FALSE)
option to omit p-values:
descr(dat, "animal", format_options = list(print_p = FALSE))
Similarily for Confidence intervals:
descr(dat, "animal", format_options = list(print_CI = FALSE))
You can use the var_options
list to control formatting and test options on a per-variable basis.
Let's say in the dataset iris
, we want that only the Sepal.Length
variable has more digits in the mean and a nonparametric test:
descr(iris, "Species", var_options = list(Sepal.Length = list( format_summary_stats = list( mean = function(x) formatC(x, digits = 4) ), test_options = c(nonparametric = TRUE) )))
DescrTab2
has many predefined significance tests, but sometimes you may need to use a custom test.
In this case, you can use the test_override option in test_options (or as a part of per
variable options, see above). To do so, test_override must be a list with at least 3 members:
custom_ttest <- list( name = "custom t-test", abbreviation = "ct", p = function(var) { return(t.test(var, alternative = "greater")$p.value) } ) descr(iris %>% select(-Species), test_options = list(test_override = custom_ttest))
If you have a lot of binary factors, you may want to suppress one of the factor levels to save space.
A common use case for this practise is when you analyse questionaires with a great deal of "yes" / "no" items.
You can do so by setting the omit_factor_level
option to either "first"
or "last"
.
descr(factor(c("a", "b")), format_options=list(omit_factor_level = "last"))
Sometimes it may be a good idea to show the confidence intervals as summary statistics.
To do so, you can supply appropriate summary statistics for the confidence intervals with corresponding formatting functions.
DescrTab2
offers the following predefined CI functions:
DescrTab2:::.meanCIlower
DescrTab2:::.meanCIupper
DescrTab2:::.factor_firstlevel_CIlower
DescrTab2:::.factor_firstlevel_CIupper
DescrTab2:::.factor_lastlevel_CIlower
DescrTab2:::.factor_lastlevel_CIupper
DescrTab2:::.HLCIlower
DescrTab2:::.HLCIupper
summary_stats_cat <- list( CIL = DescrTab2:::.factor_firstlevel_CIlower, CIU = DescrTab2:::.factor_firstlevel_CIupper) summary_stats_cont <- list( N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean = DescrTab2:::.mean, sd = DescrTab2:::.sd, CILM = DescrTab2:::.meanCIlower, CIUM = DescrTab2:::.meanCIupper) format_summary_stats <- list( CIL = scales::label_percent(), CIU = scales::label_percent(), CILM = function(x) format(x, digits = 2, scientific = 3), CIUM = function(x) format(x, digits = 2, scientific = 3), N = function(x) { format(x, digits = 2, scientific = 3) }, mean = function(x) { format(x, digits = 2, scientific = 3) }, sd = function(x) { format(x, digits = 2, scientific = 3) }, CI = function(x) { format(x, digits = 2, scientific = 3) } ) reshape_rows <- list( `CI` = list( args = c("CIL", "CIU"), fun = function(CIL, CIU) { paste0("[", CIL, ", ", CIU, "]") } ), `CI` = list( args = c("CILM", "CIUM"), fun = function(CILM, CIUM) { paste0("[", CILM, ", ", CIUM, "]") } ) ) set.seed(123) dat <- tibble(a_factor = factor(c(rep("a", 70), rep("b", 30))), a_numeric = rnorm(100), group = sample(c("Trt", "Ctrl"), 100, TRUE) ) descr(dat, "group", format_options=list(omit_factor_level = "last", categories_first_summary_stats_second = FALSE, combine_mean_sd = TRUE ), summary_stats_cat = summary_stats_cat, summary_stats_cont = summary_stats_cont, reshape_rows = reshape_rows, format_summary_stats = format_summary_stats)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.