flex_table1: Creates a Descriptive Bivariate Table1 Ready for Publication
In Buedenbender/datscience: Data and Science Utility Functions

flex_table1

R Documentation

Creates a Descriptive Bivariate Table1 Ready for Publication

Description

A convenience function, that provides and easy wrapper for the two main enginges of the function

table1 provides a nice API given a formula to create demographics tables. I basically just advanced the functionality of the p-value function to also be able to run for multiple groups (ANOVA), added the possibility to correct p-values with either Bonferroni or Sidark, and set some sensible defaults to achieve a nice look
flextable which gives all the power to format the table as you please (e.g., conditional formatting ->adding bold for p values below .05), adding italic headers or notes explaining what was done.

Really all credit should go to these two packages their developers. My function just provides an easy to use API or wrapper around their packages to get a beautiful publication ready bivariate comparison Table 1.

Usage

flex_table1(
  str_formula,
  data,
  correct = NA,
  num = NA,
  table_caption = NA,
  ref_correction = TRUE,
  include_teststat = TRUE,
  drop_unused_cats = TRUE,
  PCTexcludeNA = TRUE,
  overall = FALSE,
  ...
)

Arguments

`str_formula`	A string representing a formula, e.g., `"~ Sepal.Length + Sepal.Width \| Species"` used to construct the `table1`.
`data`	The dataset containing the variables for the table1 call (all terms from the str_formula must be present)
`correct`	Character, default = NA; NA for no correction. Currently available are "bonf" for Bonferroni correction or "sidark" for Sidark correction. If you want any other correction included just open an issue <https://github.com/Buedenbender/datscience/issues> or contact me via mail. Please see also the references and details on correction for multiple comparison
`num`	Integer number of comparisons. If NA will be determined automatically, by the number of terms in the formula
`table_caption`	Caption for the table, each element of the vector represents a new line. The first line will be bold face. All additional lines are in italic.
`ref_correction`	Boolean, default = TRUE, if TRUE corrected p-Values will be referenced in the foot note.
`include_teststat`	Boolean, default = TRUE, if TRUE includes two additional columns in the table. 1) Test statistic (either t, f or X²) and 2) degrees of Freedom
`drop_unused_cats`	Boolean, default = TRUE, if TRUE categories (i.e., factor levels) with 0 observations will be dropped.
`PCTexcludeNA`	Boolean, default = TRUE, Should calculation of percentages include or exclude Missings values. If PCTexcludeNA = TRUE, missings will be excluded.
`overall`	Character, default = FALSE, Should the final table also include a column for the totals of the sample? If a character is provided this give the name of the new column (recommendation "Overall")
`...`	(Optional), Additional arguments that can be passed to `format_flextable` (e.g., fontsize, font ...) or to `serialNext`

Details

On Fisher's Exact Test (FET) vs Pearson's χ²-test
Newest feature (as of 07/22), according to an excellent post on cross-validated \insertCiteHarrell_cross_11datscience the function refrains from using Fisher's exact test (FET) for categorical variables and only applies FET in the the rare case of cells with an expected cell frequencies do not exceed 1. This is due to the fact, that the FET can be extreme resource intensive (and slow), and can have type I error rates less than the nominal level \insertCiteCrans2008datscience Contemporary evidence suggests, that Pearson s χ²-test with the modification of \frac{N-1}{N}, nearly allways is more accurate than FET and generally recommended \insertCiteLydersen2009datscience. Thus in accordance we use the N-1 Pearson χ²-test proposed by (E.) Pearson and recommended as optimum test policy by \insertCiteCampbell2007datscience.

On Multiple Comparisons
Let me start with a direct quote "(..) researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing." \insertCiteRubin2021datscience
Whether, how and when to correct for multiple comparison in inferential statistic, is still a an area of ongoing debate. However it was recently argued that it is essential to differentiate between different forms of multiple comparisons, to make the decision for or against a correction \insertCiteRubin2021datscience. The types of multiple testing are:

disjunction testing
conjunction testing
individual testing

Correction is primarly adequate in case of disjunction testing. Please refer to the very well written and laid out original publication for more details. For the use case of this function, one can assume a joint null hypotheses, being that Group A <...> Group N do not differ. Now for example, if it is sufficient that the groups differ significantly in one characteristic, this would be considered disjunction testing.
However, if we are only interested in the constituent (null-)hypotheses (e.g., the groups differ in their highest level of education vs. they differ in the current employment status), it could be categorized as individual testing. Please chose considerately for your individual case. However for the typical exploratory bivariate comparison in sociodemographic table1, I deem it to be frequently cases of individual testing, thus the flex_table1() function defaults to applying no correction.

Value

A flextable object with APA ready correlation table. If a filepath is provided it also creates the respective file (e.g., a word .docx file)

Author(s)

Bjoern Buedenbender

References

\insertAllCited

Examples

## Not run: 
# Comparison of just two Groups
str_formula <- "~ Sepal.Length + Sepal.Width +test | Species"
data <- dplyr::filter(iris, Species %in% c("setosa", "versicolor"))
data$test <- factor(rep(c("Female", "Male"), 50))
table_caption <- c("Table 1", "A test on the Iris Data")
flex_table1(str_formula, data = data, table_caption = table_caption)

# Comparison of Multiple Groups (ANOVA)
str_formula <- "~ Sepal.Length + Sepal.Width + Gender_example | Species"
data <- dplyr::filter(iris, Species %in% c("setosa", "versicolor"))
data <- iris
data$Gender_example <- factor(rep(c("Female", "Male"), nrow(data) / 2))
table_caption <- c("Table 1", "A test on the Iris Data")
flex_table1(str_formula, data = data, table_caption = table_caption)

## End(Not run)

Buedenbender/datscience documentation built on Nov. 21, 2022, 11:14 a.m.