Usage guidance"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

DescrTab2 is the replacement of the DescrTab package. It supports a variety of different customization options and can be used in .Rmd files in conjunction with knitr.

Preamble settings

DescrTab2 works in your R-console, as well as in .Rmd documents corresponding to output formats of the type pdf_documument, html_document and word_document. It even supports YAML-headers with multiple output formats! For example, if your YAML-header looks like the example below, DescrTab2 should automagically detect the output format depending on the rendering option you choose from the dropdown menue (the arrow next to the "Knit" button on the top menue bar).

---
title: "DescrTab2 tutorial"
output:
  word_document: default
  pdf_document: default
  html_document: default
---

Required LaTeX packages should be loaded automatically as well when rendering as a pdf.

Getting started

Make sure you include the DescrTab2 library by typing

library(DescrTab2)

somewhere in the document before you use it. You are now ready to go!

We will use two tidyverse libraries for data manipulation and a the following dataset for instructive purposes:

library(dplyr, warn.conflicts = FALSE)
library(forcats)
set.seed(123)
dat <- iris[, c("Species", "Sepal.Length")] %>%
  mutate(animal = c("Mammal", "Fish") %>% rep(75) %>% factor()) %>%
  mutate(food = c("fries", "wedges") %>% sample(150, TRUE) %>% factor())
head(dat)

Producing beautiful descriptive tables is now as easy as typing:

descr(dat)

Accessing table elements

The object returned from the descr function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by descr:

my_table <- descr(dat)

You can then access the elements of the list using the $ operator.

my_table$variables$Sepal.Length$results$Total$mean

Rstudios autocomplete suggestions are very helpful when navigating this list.

The print function returns a formatted version of this list, which you can also save and access using the same syntax.

my_table <- descr(dat) %>% print(silent=TRUE)

Specifying a group

Use the group option to specify the name of a grouping variable in your data:

descr(dat, "Species")

Assigning labels

Use the group_labels option to assign group labels and the var_labels option to assign variable labels:

descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))

Assigning a table caption

Use the caption member of the format_options argument to assign a table caption:

descr(dat, "Species", format_options = list(caption="Description of our example dataset."))

Confidence intervals for two group comparisons

For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:

descr(dat, "animal")

Different tests

There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/b_test_choice_tree_pdf.pdf

Here are some different tests in action:

descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))

Paired observations

In situations with paired data, the group variable usually denotes the timing of the measurement (e.g. "before" and "after" or "time 1", "time 2", etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The test_options =list(paired=TRUE, indices = <Character name of index variable name or vector of indices>) option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in "long format", see e.g. ?reshape or ?tidyr::pivot_longer for information on how to transoform your data from wide to long format.

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2)))

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" ))

Significant digits

Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:

descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )

Omitting summary statistics

Let's say you don't want to calculate quantiles for your numeric variables. You can specify the summary_stats_cont option to include all summary statistics but quantiles:

descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean =
    DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max =
    DescrTab2:::.max))

Adding summary statistics

Let's say you have a categorical variable, but for some reason it's levels are numerals and you want to calculate the mean. No problem:

# Create example dataset
dat2 <- iris
dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor()
dat2 <- dat2[, c("Species", "cat_var")]

descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))

Combining mean and sd

Use the format_options = list(combine_mean_sd=TRUE) option:

descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))

Omitting p-values

You can declare the format_options = list(print_p = FALSE) option to omit p-values:

descr(dat, "animal", format_options = list(print_p = FALSE))

Similarily for Confidence intervals:

descr(dat, "animal", format_options = list(print_CI = FALSE))

Controling options on a per-variable level

You can use the var_options list to control formatting and test options on a per-variable basis. Let's say in the dataset iris, we want that only the Sepal.Length variable has more digits in the mean and a nonparametric test:

descr(iris, "Species", var_options = list(Sepal.Length = list(
  format_summary_stats = list(
    mean = function(x)
      formatC(x, digits = 4)
  ),
  test_options = c(nonparametric = TRUE)
)))

Use user defined test statistics

DescrTab2 has many predefined significance tests, but sometimes you may need to use a custom test. In this case, you can use the test_override option in test_options (or as a part of per variable options, see above). To do so, test_override must be a list with at least 3 members:

custom_ttest <- list(
  name = "custom t-test",
  abbreviation = "ct",
  p = function(var) {
    return(t.test(var, alternative = "greater")$p.value)
  }
)

descr(iris %>% select(-Species), test_options = list(test_override = custom_ttest))

Supress the last factor level

If you have a lot of binary factors, you may want to suppress one of the factor levels to save space. A common use case for this practise is when you analyse questionaires with a great deal of "yes" / "no" items. You can do so by setting the omit_factor_level option to either "first" or "last".

descr(factor(c("a", "b")), format_options=list(omit_factor_level = "last"))

Confidence intervals as summary statistics

Sometimes it may be a good idea to show the confidence intervals as summary statistics. To do so, you can supply appropriate summary statistics for the confidence intervals with corresponding formatting functions. DescrTab2 offers the following predefined CI functions:

summary_stats_cat <- list(
  CIL = DescrTab2:::.factor_firstlevel_CIlower,
  CIU = DescrTab2:::.factor_firstlevel_CIupper)

summary_stats_cont  <-  list(
  N = DescrTab2:::.N,
  Nmiss = DescrTab2:::.Nmiss,
  mean = DescrTab2:::.mean,
  sd = DescrTab2:::.sd,
  CILM = DescrTab2:::.meanCIlower,
  CIUM = DescrTab2:::.meanCIupper)

format_summary_stats <- list(
  CIL = scales::label_percent(),
  CIU = scales::label_percent(),
  CILM = function(x) format(x, digits = 2, scientific = 3),
  CIUM = function(x) format(x, digits = 2, scientific = 3),
  N = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  mean = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  sd = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  CI = function(x) {
    format(x, digits = 2, scientific = 3)
  }
)

reshape_rows <- list(
  `CI` = list(
    args = c("CIL", "CIU"),
    fun = function(CIL, CIU) {
      paste0("[", CIL, ", ", CIU, "]")
    }
  ),
  `CI` = list(
    args = c("CILM", "CIUM"),
    fun = function(CILM, CIUM) {
      paste0("[", CILM, ", ", CIUM, "]")
    }
  )
)

set.seed(123)
dat <- tibble(a_factor = factor(c(rep("a", 70), rep("b", 30))),
              a_numeric = rnorm(100),
              group = sample(c("Trt", "Ctrl"), 100, TRUE)
)

descr(dat, "group",
  format_options=list(omit_factor_level = "last",
  categories_first_summary_stats_second = FALSE,
  combine_mean_sd = TRUE
  ),
  summary_stats_cat = summary_stats_cat,
  summary_stats_cont = summary_stats_cont,
  reshape_rows = reshape_rows,
  format_summary_stats = format_summary_stats)


Try the DescrTab2 package in your browser

Any scripts or data that you put into this service are public.

DescrTab2 documentation built on Sept. 6, 2022, 9:05 a.m.