tab: Single cross-table, with color helpers
In BriceNocenti/tablr: User-Friendly Tables with Color Helpers for Data Exploration

View source: R/tab.R

tab	R Documentation

Single cross-table, with color helpers

Description

A full-featured function to create, manipulate and format single cross-tables, using colors to make the printed tab more easily readable (in R terminal or exported to Excel with tab_xl). Since objects of class tabxplor_tab are also of class tibble, you can then use all dplyr verbs to modify the result, like select, like arrange, filter or mutate. Wrapper around the more powerful tab_many.

Usage

tab(
  data,
  row_var,
  col_var,
  tab_vars,
  wt,
  sup_cols,
  pct = "no",
  color = "no",
  OR = "no",
  chi2 = FALSE,
  na = "keep",
  cleannames = NULL,
  other_if_less_than = 0,
  other_level = "Others",
  ref = "auto",
  ref2 = "first",
  comp = "tab",
  ci = "no",
  conf_level = 0.95,
  totaltab = "line",
  totaltab_name = "Ensemble",
  tot = c("row", "col"),
  total_names = "Total",
  add_n = TRUE,
  add_pct = FALSE,
  subtext = "",
  digits = 0,
  filter
)

Arguments

`data`	A data frame.
`row_var`, `col_var`	The row variable, which will be printed with one level per line, and the column variable, which will be printed with one level per column. For numeric variables means are calculated, in a single column.
`tab_vars`	<tidy-select> Tab variables : a subtable is made for each combination of levels of the selected variables. Leave empty to make a simple cross-table. All `tab_vars` are converted to factor.
`wt`	A weight variable, of class numeric. Leave empty for unweighted results.
`sup_cols`	<tidy-select> Supplementary columns variables, with only the first level printed, and row percentages (for numeric variables, a mean will be calculated for each `row_var`). To pass many variables you may use syntax `sup_cols = c(sup_col1, sup_col2, ...)`. To keep all levels of other `col_vars`, or other types of percentages, use `tab_many` instead.
`pct`	The type of percentages to calculate : `"row"`: row percentages. `"col"`: column percentages. `"all"`: frequencies for each subtable/group, if there is `tab_vars`. `"all_tabs"`: frequencies for the whole (set of) table(s).
`color`	The type of colors to print, as a single string : `"no"`: by default, no colors are printed. `"diff"`: color percentages and means based on cells differences from totals (or from first cells when `ref = "first"`). `"diff_ci"`: color pct and means based on cells differences from totals or first cells, removing coloring when the confidence interval of this difference is higher than the difference itself. `"after_ci"`: idem, but cut off the confidence interval from the difference first. `"contrib"`: color cells based on their contribution to variance (except mean columns, from numeric variables). `"OR"`: for `pct == "col"` or `pct == "row"`, color based on odds ratios (or relative risks ratios) `"auto"`: frequencies (`pct = "all"`, `pct = "all_tabs"`) and counts are colored with `"contrib"`. When `ci = "diff"`, row and col percentages are colored with "after_ci" ; otherwise they are colored with "diff".
`OR`	With `pct = "row"` or `pct = "col"`, calculate and print odds ratios (for binary variables) or relative risks ratios (for variables with 3 levels or more). `"no"`: by default, no OR are calculated. `"OR"`: print OR (instead of percentages). `"OR_pct"`: print OR, with percentages in bracket.
`chi2`	Set to `TRUE` to calculate Chi2 summaries with `tab_chi2`. Useful to print metadata, and to color cells based on their contribution to variance (`color = "contrib"`). Automatically added if needed for `color`.
`na`	The policy to adopt for missing values, as a single string : `"keep"`: by default, `NA`'s of row, col and tab variables are printed as an explicit `"NA"` level. `"drop"`: remove `NA`'s in row, col and tab variables before calculations are done. Supplementary columns are then calculated for observations with no `NA` in any of the row, col and tab variables.
`cleannames`	Set to `TRUE` to clean levels names, by removing prefix numbers like "1-", and text in parenthesis. All data formatting arguments are passed to `tab_prepare`.
`other_if_less_than`	When set to a positive integer, levels with less count than it will be merged into an "Others" level.
`other_level`	The name of the "Other" level, as a single string.
`ref`	The reference cell to calculate differences and ratios (used to print `colors`) : `"auto"`: by default, cell difference from the corresponding total (rows or cols depending on `pct = "row"` or `pct = "col"`) is used for `diff` ; cell ratio from the first line (or col) is use for `OR` (odds ratio/relative risks ratio). `"tot"`: totals are always used. `"first"`: calculate cell difference or ratio from the first cell of the row or column (useful to color temporal developments). `n`: when `ref` is an integer, the nth row (or column) is used for comparison. `"regex"`: when `ref` is a string, it it used as a regular expression, to match with the names of the rows (or columns). Be precise enough to match only one column or row, otherwise you get a warning message. `"no"`: not use ref and not calculate diffs to gain calculation time.
`ref2`	A second reference cell is needed to calculate odds ratios (or relative risks ratios). The first cell of the row or column is used by default. See `ref` above for the full list of possible values.
`comp`	The comparison level : by subtables/groups, or for the whole table. `"tab"`: by default, contributions to variance, row differences from totals/first cells, and row confidence intervals for these differences, are calculated for each `tab_vars` group. `"all"`: compare cells to the general total line (provided there is a total table with a total row), or with the first line of the total table when `ref = "first"`.
`ci`	The type of confidence intervals to calculate, passed to `tab_ci` (automatically added if needed for `color`). `"cell"`: absolute confidence intervals of cells percentages. `"diff"`: confidence intervals of the difference between a cell and the relative total cell (or relative first cell when `ref = "first"`). `"auto"`: `ci = "diff"` for means and row/col percentages, `ci = "cell"` for frequencies ("all", "all_tabs"). By default, for percentages, with Wilson's method is used, and with `ci = "diff"` Wald's method along Agresti and Caffo's adjustment. Means use classic method. This can be changed in `tab_many`. By default, with `ci = "cell"`, the result is printed in the `⁠[inf;sup]⁠` form. Set `options("tabxplor.ci_print" = "moe")` to print `pct +- moe` instead.
`conf_level`	The confidence level, as a single numeric between 0 and 1. Default to 0.95 (95%).
`totaltab`	The total table, if there are subtables/groups (i.e. when `tab_vars` is provided) : `"line"`: by default, add a general total line (necessary for calculations with `comp = "all"`) `"table"`: add a complete total table (i.e. `row_var` by `col_vars` without `tab_vars`). `"no"`: not to draw any total table.
`totaltab_name`	The name of the total table, as a single string.
`tot`	The totals : `c("col", "row")` or `"both"` : by default, both total rows and total columns. `"row"`: only total rows. `"col"`: only total column. `"no"`: remove all totals (after calculations if needed).
`total_names`	The names of the totals, as a character vector of length one or two. Use syntax of type `c("Total row", "Total column")` to set different names for rows and cols.
`add_n`	For `pct = "row"` or `pct = "col"`, set to `FALSE` not to add another column or row with unweighted counts (`n`).
`add_pct`	Set to `TRUE` to add a column with the frequencies of the row variable (for `pct = "row"`) or a row with the frequencies of the column variable (for `pct = "col"`).
`subtext`	A character vector to print rows of legend under the table.
`digits`	The number of digits to print, as a single integer. To print a different number of digits for each `sup_cols`, an integer vector of length 1 + `sup_cols` (the first being the number of digits for the base table).
`filter`	A `dplyr::filter` to apply to the data frame first, as a single string (which will be converted to code, i.e. to a call). Useful when printing multiples tabs with `tibble::tribble`, to use different filters for similar tables or simply make the field of observation more visible into the code.

Value

A tibble of class tab, possibly with colored reading helpers. All non-text columns are of class fmt, storing all the data necessary to print formats and colors. Columns with row_var and tab_vars are of class factor : every added factor will be considered as a tab_vars and used for grouping. To add text columns without using them in calculations, be sure they are of class character.

Examples

# A simple cross-table:
tab(forcats::gss_cat, marital, race)


# With more variables provided, `tab` makes a subtables for each combination of levels:

tab(forcats::gss_cat, marital, tab_vars = c(year, race))


# You can also add supplementary columns, text or numeric:

tab(dplyr::storms, category, status, sup_cols = c("pressure", "wind"))


# Colors to help the user read the table:
data <- forcats::gss_cat %>%
  dplyr::filter(year %in% c(2000, 2006, 2012), !marital %in% c("No answer", "Widowed"))
gss  <- "Source: General social survey 2000-2014"
gss2 <- "Source: General social survey 2000, 2006 and 2012"

# Differences between the cell and it's subtable's total cell:

tab(data, race, marital, year, subtext = gss2, pct = "row", color = "diff")


# Differences between the cell and the whole table's general total cell:

tab(data, race, marital, year, subtext = gss2, pct = "row", color = "diff",
  comp = "all")


# Historical differences:

data2 <- data %>% dplyr::mutate(year = as.factor(year))
tab(data2, year, marital, race, subtext = gss2, pct = "row",
    color = "diff", ref = "first", tot = "col")


# Differences with the total, except if their confidences intervals are superior to them:
tab(forcats::gss_cat, race, marital, subtext = gss, pct = "row", color = "diff_ci")

# Same differences, minus their confidence intervals:
tab(forcats::gss_cat, race, marital, subtext = gss, pct = "row", color = "after_ci")

# Contribution of cells to table's variance, like in a correspondence analysis:
tab(forcats::gss_cat, race, marital, subtext = gss, color = "contrib")


# Since the result is a tibble, you can use all dplyr verbs to modify it :

library(dplyr)
tab(dplyr::storms, category, status, sup_cols = c("pressure", "wind")) %>%
  dplyr::filter(category != "-1") %>%
  dplyr::select(-`tropical depression`) %>%
  dplyr::arrange(is_totrow(.), desc(category))



# With `dplyr::arrange`, don't forget to keep the order of tab variables and total rows:
tab(data, race, marital, year, pct = "row") %>%
  dplyr::arrange(year, is_totrow(.), desc(Married))

BriceNocenti/tablr documentation built on April 12, 2025, 12:56 a.m.