desctable usage vignette (deprecated)

library(desctable)

options(DT.options = list(#scrollX = T,
                          info = F,
                          search = F,
                          dom = "Brtip",
                          fixedColumns = T))
knitr::opts_chunk$set(message = F, warning = F, screenshot.force = F)

Desctable is a comprehensive descriptive and comparative tables generator for R.

Every person doing data analysis has to create tables for descriptive summaries of data (a.k.a. Table.1), or comparative tables.

Many packages, such as the aptly named tableone, address this issue. However, they often include hard-coded behaviors, have outputs not easily manipulable with standard R tools, or their syntax are out-of-style (e.g. the argument order makes them difficult to use with the pipe (%>%)).

Enter desctable, a package built with the following objectives in mind:


Descriptive tables

Simple usage

desctable uses and exports the pipe (%>%) operator (from packages magrittr and dplyr fame), though it is not mandatory to use it.

The single interface to the package is its eponymous desctable function.

When used on a data.frame, it returns a descriptive table:

iris %>%
  desctable()

desctable(mtcars)


As you can see with these two examples, desctable describes every variable, with individual levels for factors. It picks statistical functions depending on the type and distribution of the variables in the data, and applies those statistical functions only on the relevant variables.

Output

The object produced by desctable is in fact a list of data.frames, with a "desctable" class.
Methods for reduction to a simple dataframe (as.data.frame, automatically used for printing), conversion to markdown (pander), and interactive html output with DT (datatable) are provided:

iris %>%
  desctable() %>%
  pander()

mtcars %>%
  desctable() %>%
  datatable()


To use pander you need to load the package yourself.

Calls to pander and datatable with "regular" dataframes will not be affected by the defaults used in the package, and you can modify these defaults for desctable objects.

The datatable wrapper function for desctable objects comes with some default options and formatting such as freezing the row names and table header, export buttons, and rounding of values. Both pander and datatable wrapper take a digits argument to set the number of decimals to show. (pander uses the digits, justify and missing arguments of pandoc.table, whereas datatable calls prettyNum with the digits parameter, and removes NA values. You can set digits = NULL if you want the full table and format it yourself)

Subsequent outputs in this vignette will use DT.

Advanced usage

desctable automatically chooses statistical functions if none is provided, using the following algorithm:

For each variable in the table, compute the relevant statistical functions in that list (non-applicable functions will safely return NA).

You can specify the statistical functions yourself with the stats argument. This argument can either be:

The functions/formulas leverage the tidyverse way of working with anonymous functions, i.e.:

If a function, is is used as is. If a formula, e.g. '~ .x + 1' or ~ . + 1, it is converted to a function. There are three ways to refer to the arguments:

This syntax allows you to create very compact anonymous functions, and is the same as in the map family of functions from purrr.

Conditional formulas (condition ~ if_T | if F) from previous versions are no longer supported!

Automatic function

The default value for the stats argument is stats_auto, provided in the package.

Several other "automatic statistical functions" are defined in this package: stats_auto, stats_default, stats_normal, stats_nonnormal.

You can also provide your own automatic function, which needs to

# Strictly equivalent to iris %>% desctable() %>% datatable()
iris %>%
  desctable(stats = stats_auto) %>%
  datatable()


For reference, here is the body of the stats_auto function in the package:

print(stats_auto)


Statistical functions

Statistical functions can be any function defined in R that you want to use, such as length or mean.

The only condition is that they return a single numerical value. One exception is when they return a vector of length 1 + nlevels(x) when applied to factors, as is needed for the percent function.

As mentioned above, they need to be used inside a named list, such as

mtcars %>%
  desctable(stats = list("N" = length, "Mean" = mean, "SD" = sd)) %>%
  datatable()


The names will be used as column headers in the resulting table, and the functions will be applied safely on the variables (errors return NA, and for factors the function will be used on individual levels).

Several convenience functions are included in this package.

Be aware that all functions will be used on variables stripped of their NA values! This is necessary for most statistical functions to be useful, and makes N (length) show only the number of observations in the dataset for each variable.

Labels

It is often the case that variable names are not "pretty" enough to be used as-is in a table.
Although you could still edit the variable labels in the table afterwards using sub-setting or string replacement functions, we provide a facility for this using the labels argument.

The labels argument is a named character vector associating variable names and labels.
You don't need to provide labels for all the variables, and extra labels will be silently discarded. This allows you to define a "global" labels vector and use it for multiple tables even after variable selections.

mtlabels <- c(mpg  = "Miles/(US) gallon",
              cyl  = "Number of cylinders",
              disp = "Displacement (cu.in.)",
              hp   = "Gross horsepower",
              drat = "Rear axle ratio",
              wt   = "Weight (1000 lbs)",
              qsec = "ΒΌ mile time",
              vs   = "V/S",
              am   = "Transmission",
              gear = "Number of forward gears",
              carb = "Number of carburetors")

mtcars %>%
  dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
  desctable(labels = mtlabels) %>%
  datatable()



Comparative tables

Simple usage

Creating a comparative table (between groups defined by a factor) using desctable is as easy as creating a descriptive table.

It leverages the group_by function from dplyr:

iris %>%
  group_by(Species) %>%
  desctable() -> iris_by_Species

iris_by_Species


The result is a table containing a descriptive sub-table for each level of the grouping factor (the statistical functions rules are applied to each sub-table independently), with the statistical tests performed, and their p values.

When displayed as a flat dataframe, the grouping header appears in each variable name.

You can also see the grouping headers by inspecting the resulting object, which is a nested list of dataframes, each dataframe being named after the grouping factor and its levels (with sample size for each).

str(iris_by_Species)


You can specify groups based on any variable, not only factors:

# With pander output
mtcars %>%
  group_by(cyl) %>%
  desctable() %>%
  pander()


You can also specify groups based on an expression

# With datatable output
iris %>%
  group_by(Petal.Length > 5) %>%
  desctable() %>%
  datatable()


Multiple nested groups are also possible:

mtcars %>%
  dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
  group_by(vs, am, cyl) %>%
  desctable() %>%
  datatable()


In the case of nested groups (a.k.a. sub-group analysis), statistical tests are performed only between the groups of the deepest grouping level.

Statistical tests are automatically selected depending on the data and the grouping factor.

Advanced usage

desctable automatically chooses statistical functions if none is provided, using the following algorithm:

You can specify the statistical test functions yourself with the tests argument. This argument can either be:

Please note that the statistical test functions must be given as formulas so as to capture the name of the test to display in the table. purrr style formulas are also actepted, as with the statistical functions. This also allows to specify optional arguments of such functions, and go around non-standard test functions (see Statistical test functions).

Automatic function

The default value for the tests argument is tests_auto, provided in the package.

You can also provide your own automatic function, which needs to

This function will be used on every variable and every grouping factor to determine the appropriate test.

# Strictly equivalent to iris %>% group_by(Species) %>% desctable() %>% datatable()
iris %>%
  group_by(Species) %>%
  desctable(tests = tests_auto) %>%
  datatable()


For reference, here is the body of the tests_auto function in the package:

print(tests_auto)


Statistical test functions

You can provide a named list of statistical functions, but here the mechanism is a bit different from the stats argument.

The list must contain either .auto or .default.

You can also provide overrides to use specific tests for specific variables.
This is done using list items named as the variable and containing a single-term formula function.

iris %>%
  group_by(Petal.Length > 5) %>%
  desctable(tests = list(.auto   = tests_auto,
                         Species = ~chisq.test)) %>%
  datatable()


mtcars %>%
  dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
  group_by(am) %>%
  desctable(tests = list(.default = ~wilcox.test,
                         mpg      = ~t.test)) %>%
  datatable()

Here's an example of purrr style function:

iris %>%
  group_by(Petal.Length > 5) %>%
  desctable(tests = list(.auto = tests_auto,
                         Petal.Width = ~oneway.test(., var.equal = T)))


As with statistical functions, any statistical test function defined in R can be used.

The conditions are that the function

Several convenience function are provided: formula versions for chisq.test and fisher.test using generic S3 methods (thus the behavior of standard calls to chisq.test and fisher.test are not modified), and ANOVA, a partial application of oneway.test with parameter var.equal = T.



Try the desctable package in your browser

Any scripts or data that you put into this service are public.

desctable documentation built on March 24, 2022, 5:07 p.m.