tab_many: Many cross-tables as one, with color helpers
In BriceNocenti/tablr: User-Friendly Tables with Color Helpers for Data Exploration

tab_many

R Documentation

Many cross-tables as one, with color helpers

Description

A full-featured function to create, manipulate and format many cross-tables as one, using colors to make the printed tab more easily readable (in R terminal or exported to Excel with tab_xl). Since objects of class tabxplor_tab are also of class tibble, you can then use all dplyr verbs to modify the result, like select, arrange, filter or mutate.

Only breaks for attractions/over-representations (in green) should be given, as a vector of positive doubles, with length between 1 and 5. Breaks for aversions/under-representations (in orange/red) will simply be the opposite.

Usage

tab_many(
  data,
  row_vars,
  col_vars,
  tab_vars,
  wt,
  pct = "no",
  color = "no",
  OR = "no",
  chi2 = FALSE,
  na = "keep",
  levels = "all",
  na_drop_all,
  cleannames = NULL,
  compact = NULL,
  other_if_less_than = 0,
  other_level = "Others",
  ref = "auto",
  ref2 = "first",
  comp = "tab",
  ci = "no",
  conf_level = 0.95,
  method_cell = "wilson",
  method_diff = "ac",
  totaltab = "line",
  totaltab_name = "Ensemble",
  totrow = TRUE,
  totcol = "last",
  total_names = "Total",
  add_n = TRUE,
  add_pct = FALSE,
  digits = 0,
  subtext = "",
  filter
)

tab_get_vars(tabs, vars = c("row_var", "col_vars", "tab_vars"))

is_tab(x)

set_color_style(
  type = c("text", "bg"),
  theme = NULL,
  html_24_bit = c("blue_red", "green_red", "no"),
  custom_palette = NULL
)

get_color_style(
  mode = c("crayon", "color_code"),
  type = NULL,
  theme = NULL,
  html_24_bit = NULL
)

set_color_breaks(pct_breaks, mean_breaks, contrib_breaks)

get_color_breaks(brk, type = c("positive", "all"))

Arguments

`data`	A data frame.
`row_vars`	The row variable, which will be printed with one level per line. If numeric, it will be converted to factor. If more than one row_var if provided, a different table is made for each of them.
`col_vars`	<tidy-select> One column is printed for each level of each column variable. For numeric variables means are calculated, in a single column. To pass many variables you may use syntax `col_vars = c(col_var1, col_var2, ...)`.
`tab_vars`	<tidy-select> One subtable is made for each combination of levels of the tab variables. To pass many variables you may use syntax `tab_vars = c(tab_var1, tab_var2, ...)`. All tab variables are converted to factor. Leave empty to make a simple table.
`wt`	A weight variable, of class numeric. Leave empty for unweighted results.
`pct`	The type of percentages to calculate : `"row"`: row percentages. `"col"`: column percentages. `"all"`: frequencies for each subtable/group, if there is `tab_vars`. `"all_tabs"`: frequencies for the whole (set of) table(s). The argument is vectorised over both `row_vars` and `col_vars`. You can then write as the following : `pct = list(row_var1 = list("row", "col", "col"), row_var2 = list("col", "row", "row"))`
`color`	The type of colors to print, as a single string. Vectorised over `row_vars`. `"no"`: by default, no colors are printed. `"diff"`: color percentages and means based on cells differences from totals (or from first cells when `ref = "first"`). `"diff_ci"`: color pct and means based on cells differences from totals or first cells, removing coloring when the confidence interval of this difference is higher than the difference itself. `"after_ci"`: idem, but cut off the confidence interval from the difference first. `"contrib"`: color cells based on their contribution to variance (except mean columns, from numeric variables). `"OR"`: for `pct == "col"` or `pct == "row"`, color based on odds ratios (or relative risks ratios) `"auto"`: frequencies (`pct = "all"`, `pct = "all_tabs"`) and counts are colored with `"contrib"`. When `ci = "diff"`, row and col percentages are colored with "after_ci" ; otherwise they are colored with "diff".
`OR`	With `pct = "row"` or `pct = "col"`, calculate and print odds ratios (for binary variables) or relative risks ratios (for variables with 3 levels or more). `"no"`: by default, no OR are calculated. `"OR"`: print OR (instead of percentages). `"OR_pct"`: print OR, with percentages in bracket.
`chi2`	Set to `TRUE` to calculate Chi2 summaries with `tab_chi2`. Useful to print metadata, and to color cells based on their contribution to variance (`color = "contrib"`). Vectorised over `row_vars`.
`na`	The policy to adopt with missing values. It must be a single string. `na = "keep"`: by default, prints `NA`'s as explicit `"NA"` level. `na = "drop"`: removes `NA` levels before making each table (tabs made with different column variables may have a different number of observations, and won't exactly have the same total columns). `"drop_all"`: remove `NA`'s for all variables before making the tables.
`levels`	The levels of `col_vars` to keep (for more complex selections use `dplyr::select`). The argument is vectorised over `col_vars`. `"all"`: by default, all levels are kept. `"first"`: only keep the first level of each `col_vars` `"auto"`: keep the first level when `col_var` is only two levels, keep all levels otherwise
`na_drop_all`	<tidy-select> Removes all observations with a `NA` in any of the chosen variables, for all tables (tabs for each column variable will have the same number of observations).
`cleannames`	Set to `TRUE` to clean levels names, by removing prefix numbers like "1-", and text in parenthesis. All data formatting arguments are passed to `tab_prepare`.
`compact`	With several `row_vars`, set to `TRUE` to bind all tables in a single `tabxplor_tab`. If not provided, the value of `getOption("tabxplor.compact")` is taken (`FALSE` by default). Set `options(tabxplor.compact = TRUE)` to make this the default behaviour for all tables (but beware becauce it can break existing code).
`other_if_less_than`	When set to a positive integer, levels with less count than it will be merged into an "Others" level.
`other_level`	The name of the "Other" level, as a single string.
`ref`	The reference cell to calculate differences and ratios (used to print `colors`) : `"auto"`: by default, cell difference from the corresponding total (rows or cols depending on `pct = "row"` or `pct = "col"`) is used for `diff` ; cell ratio from the first line (or col) is use for `OR` (odds ratio/relative risks ratio). `"tot"`: totals are always used. `"first"`: calculate cell difference or ratio from the first cell of the row or column (useful to color temporal developments). `n`: when `ref` is an integer, the nth row (or column) is used for comparison. `"regex"`: when `ref` is a string, it it used as a regular expression, to match with the names of the rows (or columns). Be precise enough to match only one column or row, otherwise you get a warning message. `"no"`: not use ref and not calculate diffs to gain calculation time.
`ref2`	A second reference cell is needed to calculate odds ratios (or relative risks ratios). The first cell of the row or column is used by default. See `ref` above for the full list of possible values.
`comp`	The comparison level : by subtables/groups, or for the whole table. Vectorised over `row_vars`. `"tab"`: by default, contributions to variance, row differences from totals/first cells, and row confidence intervals for these differences, are calculated for each `tab_vars` group. `"all"`: compare cells to the general total line (provided there is a total table with a total row), or with the reference line of the total table when `ref = "first"`, an integer or a regular expression.
`ci`	The type of confidence intervals to calculate, passed to `tab_ci`. Vectorised over `row_vars`. `"cell"`: absolute confidence intervals of cells percentages. `"diff"`: confidence intervals of the difference between a cell and the relative total cell (or relative first cell when `ref = "first"`). `"auto"`: `ci = "diff"` for means and row/col percentages, `ci = "cell"` for frequencies ("all", "all_tabs"). By default, for percentages, with `ci = "cell"` Wilson's method is used, and with `ci = "diff"` Wald's method along Agresti and Caffo's adjustment. Means use classic method. This can be changed with `method_cell` and `method_diff`. By default, with `ci = "cell"`, the result is printed in the `⁠[inf;sup]⁠` form. Set `options("tabxplor.ci_print" = "moe")` to print `pct +- moe` instead.
`conf_level`	The confidence level, as a single numeric between 0 and 1. Default to 0.95 (95%).
`method_cell`	Character string specifying which method to use with percentages for `ci = "cell"`. This can be one out of: "wald", "wilson", "wilsoncc", "agresti-coull", "jeffreys", "modified wilson", "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting", "pratt", "midp", "lik" and "blaker". Defaults to "wilson". See `BinomCI`.
`method_diff`	Character string specifying which method to use with percentages for `ci = "diff"`. This can be one out of: "wald", "waldcc", "ac", "score", "scorecc", "mn", "mee", "blj", "ha", "hal", "jp". Defaults to "ac", Wald interval with the adjustment according to Agresti, Caffo for difference in proportions and independent samples. See `BinomDiffCI`.
`totaltab`	The total table, if there are subtables/groups (i.e. when `tab_vars` is provided). Vectorised over `row_vars`. `"line"`: by default, add a general total line (necessary for calculations with `comp = "all"`) `"table"`: add a complete total table (i.e. `row_var` by `col_vars` without `tab_vars`). `"no"`: not to draw any total table.
`totaltab_name`	The name of the total table, as a single string.
`totrow`	By default, total rows are printed. Set to `FALSE` to remove them (after calculations if needed). Vectorised over `row_vars`.
`totcol`	The policy with total columns. Vectorised over `col_vars`. `"last"`: by default, only prints a total column for the last column variable (of class factor, not numeric). `"each"`: print a total column for each column variable. `"no"`: remove all total columns (after calculations if needed).
`total_names`	The names of the totals, as a character vector of length one or two. Use syntax of type `c("Total row", "Total column")` to set different names for rows and cols.
`add_n`	For `pct = "row"` or `pct = "col"`, set to `FALSE` not to add another column or row with unweighted counts (`n`).
`add_pct`	Set to `TRUE` to add a column with the frequencies of the row variable (for `pct = "row"`) or a row with the frequencies of the column variable (for `pct = "col"`).
`digits`	The number of digits to print, as a single integer, or an integer vector the same length as `col_vars`. The argument is vectorisez over `col_vars`.
`subtext`	A character vector to print rows of legend under the table.
`filter`	A `dplyr::filter` to apply to the data frame first, as a single string (which will be converted to code, i.e. to a call). Useful when printing multiples tabs with `tibble::tribble`, to use different filters for similar tables or simply make the field of observation more visible into the code.
`tabs`	A `tibble` of class `tab`, made with `tab`, `tab_many` or `tab_plain`.
`vars`	In `tab_get_vars`, a character vector containing the wanted vars names: `"row_var"`, `"col_vars"` or `"tab_vars"`.
`x`	A object to test with `is_tab`.
`type`	Default to `"positive"`, which just print breaks for positive spreads. Set to `all` to get breaks for negative spreads as well.
`theme`	For `set_color_style` and `get_color_style`, is your console or html table background `"light"` or `"dark"` ? Default to RStudio theme.
`html_24_bit`	Use 24bits colors palettes for html tables : set to `"green_red"` or `"blue_red"`. Only with `mode = "color_code"` (not `mode = "crayon"`) and `⁠theme = "light⁠`. Default to `getOption("tabxplor.color_html_24_bit")`.
`custom_palette`	Possibility to provide a custom color styles, as a character vector of 10 html color codes (the five first for over-represented numbers, the five last for under-represented ones). The result is saved to `options("tabxplor.color_style")`. To discard, relaunch the function with `custom_palette = NULL`.
`mode`	By default, `get_color_style` returns a list of crayon coloring functions. Set to `"color_code"` to return html color codes.
`pct_breaks`	If they are to be changed, the breaks used for percentages. Default to `c(0.05, 0.1, 0.2, 2, 0.3)` : first color used when the pct of a cell is +5% superior to the pct of the related total ; second color used when it is +10% superior ; third +20% superior ; fourth 2 superior ; fifth +30% superior. When > 1, it does not take differences but ratio. The opposite for cells inferior to the total (without the 2 rule). With `color = "after_ci"`, the first break is subtracted from all breaks (default becomes `c(0, 0.05, 0.15, 2, 0.25)` : +0%, +5%, +15%, *2, +25%).
`mean_breaks`	If they are to be changed, the breaks used for means. Default to `c(1.15, 1.5, 2, 4)` : first color used when the mean of a cell is superior to 1.15 times the mean of the related total row ; second color used when it is superior to 1.5 times ; etc. The opposite for cells inferior to the total. With `color = "after_ci"`, the first break is divided from all breaks (default becomes `c(1, 1.3, 1.7, 3.5)`).
`contrib_breaks`	If they are to be changed, the breaks used for contributions to variance. Default to `c(1, 2, 5, 10)` : first color used when the contribution of a cell is superior to the mean contribution ; second color used when it is superior to 2 times the mean contribution ; etc. The global color (for example green or red/orange) is given by the sign of the spread.
`brk`	When missing, return all color breaks. Specify to return a given color break, among `"pct"`, `"mean"`, `"contrib"`, `"pct_ci"` and `"mean_ci"`.

Value

A tibble of class tab, possibly with colored reading helpers. When there are two row_vars or more, a list of tibble of class tab. All non-text columns are of class fmt, storing all the data necessary to print formats and colors. Columns with row_var and tab_vars are of class factor : every added factor will be considered as a tab_vars and used for grouping. To add text columns without using them in calculations, be sure they are of class character.

A list with the variables names.

A single logical.

Set global options "tabxplor.color_style_type" and "tabxplor.color_style_theme", used when printing tab objects.

A vector of crayon color functions, or a vector of color html codes.

Set the global option "tabxplor.color_breaks" as a list different double vectors, and also returns it invisibly.

The color breaks as a double vector, or list of double vectors.

Functions

tab_get_vars(): Get the variables names of a tabxplor tab
is_tab(): a test function for class tabxplor_tab
set_color_style(): define the color style used to print tab.
get_color_style(): get color styles as crayon functions or html codes.
set_color_breaks(): set the breaks used to print colors
get_color_breaks(): get the breaks currently used to print colors

Examples

# Make a summary table with many col_vars, showing only one specific level :

library(dplyr)
first_lvs <- c("Married", "$25000 or more", "Strong republican", "Protestant")
data <- forcats::gss_cat %>% mutate(across(
  where(is.factor),
  ~ forcats::fct_relevel(., first_lvs[first_lvs %in% levels(.)])
))
tab_many(data, race, c(marital, rincome, partyid, relig, age, tvhours),
         levels = "first", pct = "row", chi2 = TRUE, color = "auto")


# Can be used with map and tribble to program several tables with different parameters
#  all at once, in a readable way:

library(purrr)
library(tibble)
pmap(
  tribble(
    ~row_var, ~col_vars       , ~pct , ~filter              , ~subtext               ,
    "race"  , "marital"       , "row", NULL                 , "Source: GSS 2000-2014",
    "relig" , c("race", "age"), "row", "year %in% 2000:2010", "Source: GSS 2000-2010",
    NA_character_, "race"     , "no" , NULL                 , "Source: GSS 2000-2014",
  ),
  .f = tab_many,
  data = forcats::gss_cat, color = "auto", chi2 = TRUE)

set_color_style(type = "bg")
set_color_breaks(
  pct_breaks = c(0.05, 0.15, 0.3),
  mean_breaks = c(1.15, 2, 4),
  contrib_breaks = c(1, 2, 5)
)

BriceNocenti/tablr documentation built on April 12, 2025, 12:56 a.m.