tab | R Documentation |
A full-featured function to create, manipulate and format single
cross-tables, using colors to make the printed tab more easily readable
(in R terminal or exported to Excel with tab_xl
).
Since objects of class tabxplor_tab
are also of class tibble
, you can then use all
dplyr verbs to modify the result, like select
,
like arrange
, filter
or mutate
.
Wrapper around the more powerful tab_many
.
tab(
data,
row_var,
col_var,
tab_vars,
wt,
sup_cols,
pct = "no",
color = "no",
OR = "no",
chi2 = FALSE,
na = "keep",
cleannames = NULL,
other_if_less_than = 0,
other_level = "Others",
ref = "auto",
ref2 = "first",
comp = "tab",
ci = "no",
conf_level = 0.95,
totaltab = "line",
totaltab_name = "Ensemble",
tot = c("row", "col"),
total_names = "Total",
add_n = TRUE,
add_pct = FALSE,
subtext = "",
digits = 0,
filter
)
data |
A data frame. |
row_var , col_var |
The row variable, which will be printed with one level per line, and the column variable, which will be printed with one level per column. For numeric variables means are calculated, in a single column. |
tab_vars |
<tidy-select> Tab variables :
a subtable is made for each combination of levels of the selected variables.
Leave empty to make a simple cross-table. All |
wt |
A weight variable, of class numeric. Leave empty for unweighted results. |
sup_cols |
<tidy-select>
Supplementary columns variables, with only the first level printed, and row percentages
(for numeric variables, a mean will be calculated for each |
pct |
The type of percentages to calculate :
|
color |
The type of colors to print, as a single string :
|
OR |
With
|
chi2 |
Set to |
na |
The policy to adopt for missing values, as a single string :
|
cleannames |
Set to |
other_if_less_than |
When set to a positive integer, levels with less count than it will be merged into an "Others" level. |
other_level |
The name of the "Other" level, as a single string. |
ref |
The reference cell to calculate differences and ratios
(used to print
|
ref2 |
A second reference cell is needed to calculate odds ratios
(or relative risks ratios). The first cell of the row or column is used by default.
See |
comp |
The comparison level : by subtables/groups, or for the whole table.
|
ci |
The type of confidence intervals to calculate, passed to
By default, for percentages, with Wilson's method is used,
and with |
conf_level |
The confidence level, as a single numeric between 0 and 1. Default to 0.95 (95%). |
totaltab |
The total table, if there are subtables/groups
(i.e. when
|
totaltab_name |
The name of the total table, as a single string. |
tot |
The totals :
|
total_names |
The names of the totals, as a character vector of length one or two.
Use syntax of type |
add_n |
For |
add_pct |
Set to |
subtext |
A character vector to print rows of legend under the table. |
digits |
The number of digits to print, as a single integer. To print a different
number of digits for each |
filter |
A |
A tibble
of class tab
, possibly with colored reading helpers.
All non-text columns are of class fmt
, storing all
the data necessary to print formats and colors. Columns with row_var
and
tab_vars
are of class factor
: every added factor
will be
considered as a tab_vars
and used for grouping. To add text columns without
using them in calculations, be sure they are of class character
.
# A simple cross-table:
tab(forcats::gss_cat, marital, race)
# With more variables provided, `tab` makes a subtables for each combination of levels:
tab(forcats::gss_cat, marital, tab_vars = c(year, race))
# You can also add supplementary columns, text or numeric:
tab(dplyr::storms, category, status, sup_cols = c("pressure", "wind"))
# Colors to help the user read the table:
data <- forcats::gss_cat %>%
dplyr::filter(year %in% c(2000, 2006, 2012), !marital %in% c("No answer", "Widowed"))
gss <- "Source: General social survey 2000-2014"
gss2 <- "Source: General social survey 2000, 2006 and 2012"
# Differences between the cell and it's subtable's total cell:
tab(data, race, marital, year, subtext = gss2, pct = "row", color = "diff")
# Differences between the cell and the whole table's general total cell:
tab(data, race, marital, year, subtext = gss2, pct = "row", color = "diff",
comp = "all")
# Historical differences:
data2 <- data %>% dplyr::mutate(year = as.factor(year))
tab(data2, year, marital, race, subtext = gss2, pct = "row",
color = "diff", ref = "first", tot = "col")
# Differences with the total, except if their confidences intervals are superior to them:
tab(forcats::gss_cat, race, marital, subtext = gss, pct = "row", color = "diff_ci")
# Same differences, minus their confidence intervals:
tab(forcats::gss_cat, race, marital, subtext = gss, pct = "row", color = "after_ci")
# Contribution of cells to table's variance, like in a correspondence analysis:
tab(forcats::gss_cat, race, marital, subtext = gss, color = "contrib")
# Since the result is a tibble, you can use all dplyr verbs to modify it :
library(dplyr)
tab(dplyr::storms, category, status, sup_cols = c("pressure", "wind")) %>%
dplyr::filter(category != "-1") %>%
dplyr::select(-`tropical depression`) %>%
dplyr::arrange(is_totrow(.), desc(category))
# With `dplyr::arrange`, don't forget to keep the order of tab variables and total rows:
tab(data, race, marital, year, pct = "row") %>%
dplyr::arrange(year, is_totrow(.), desc(Married))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.