freq: Frequency counts and percentages
In tidytlg: Create TLGs using the 'tidyverse'

View source: R/freq.R

freq	R Documentation

Frequency counts and percentages

Description

Frequency counts and percentages for a variable by treatment and/or group.

Usage

freq(
  df,
  denom_df = df,
  colvar = NULL,
  tablebyvar = NULL,
  rowvar = NULL,
  rowbyvar = NULL,
  statlist = getOption("tidytlg.freq.statlist.default"),
  decimal = 1,
  nested = FALSE,
  cutoff = NULL,
  cutoff_stat = "pct",
  subset = TRUE,
  descending_by = NULL,
  display_missing = FALSE,
  rowtext = NULL,
  row_header = NULL,
  .keep = TRUE,
  .ord = FALSE,
  pad = TRUE,
  ...
)

Arguments

`df`	(required) dataframe containing records to summarize by treatment.
`denom_df`	(optional) dataframe used for population based denominators (default = `df`).
`colvar`	(required) treatment variable within `df` to use to summarize
`tablebyvar`	(optional) repeat entire table by variable within `df`
`rowvar`	(required) character vector of variables to summarize within the dataframe.
`rowbyvar`	(optional) repeat `rowvar` by variable within `df`
`statlist`	(optional) `statlist` object of stats to keep of length 1 or 2 specifying list of statistics and format desired (e.g `statlist(c("N", "n (x.x\%)"))`) (default = `statlist(c("n (x.x)"))`).
`decimal`	(optional) decimal precision root level default (default = 1).
`nested`	(optional) INTERNAL USE ONLY. The default should not be changed. Switch on when this function is called by `nested_freq()` so we will not include the by variables as part of the group denominators (default = `FALSE`).
`cutoff`	(optional) percentage cutoff threshold. This can be passed as a numeric cutoff, in that case any rows with greater than or equal to that cutoff will be preserved, others will be dropped. To specify a single column to define the cutoff logic, pass a character value of the form `⁠<colName> >= <value>⁠` and only that column will be used.
`cutoff_stat`	(optional) The value to cutoff by, `n` or `pct.` (default = `'pct'`). Can be done with multiple columns by adding `&` or `\|` ex. `col1 >= val1 & col2 >= val2`.
`subset`	(optional) An R expression that will be passed to a `dplyr::filter()` function to subset the `data.frame`. This is performed on the numerator before any other derivations. Denominators must be preprocessed and passed through using `denom_df`.
`descending_by`	(optional) The column or columns to sort descending counts. Can also provide a named list to do ascending order ex. `c("VarName1" = "asc", "VarName2" = "desc")` would sort by `VarName1` in ascending order and `VarName2` in descending order. In case of a tie in count or `descending_by` not provided, the columns will be sorted alphabetically.
`display_missing`	(optional) Should the "missing" values be displayed? If missing values are displayed, denominators will include missing values. (default = `FALSE`).
`rowtext`	(optional) A character vector used to rename the `label` column. If named, names will give the new level and values will be the replaced value. If unnamed, and the table has only one row, the `rowtext` will rename the label of the row. If the `rowtext` is unnamed, the table has no rows, and there is a subset, the table will be populated with zeros and the label will be the only row.
`row_header`	(optional) A character vector to be added to the table.
`.keep`	(optional) Should the `rowbyvar` and `tablebyvar` be output in the table. If `FALSE`, `rowbyvar` will still be output in the `label` column. (Default = `TRUE`).
`.ord`	Should the ordering columns be output with the table? This is useful if a table needs to be merged or reordered in any way after build.
`pad`	(optional) A boolean that controls if levels with zero records should be included in the final table. (default = `TRUE`).
`...`	(optional) Named arguments to be included as columns on the table.

Value

A dataframe of results

Sorting a 'freq' table

By default, a frequency table is sorted based on the factor level of the rowvar variable. If the rowvar variable isn't a factor, it will be sorted alphabetically. This behavior can be modified in two ways, the first is the char2factor() function that offers a interface for discretization a variable based on a numeric variable, like VISITN. The second is based on the descending_by argument which will sort based on counts on a variable.

Examples

adsl <- data.frame(
  USUBJID = c("DEMO-101", "DEMO-102", "DEMO-103"),
  RACE = c("WHITE", "BLACK", "ASIAN"),
  SEX = c("F", "M", "F"),
  colnbr = factor(c("Placebo", "Low", "High"))
)

# Unique subject count of a single variable
freq(adsl,
  colvar = "colnbr",
  rowvar = "RACE",
  statlist = statlist("n")
)

# Unique subject count and percent of a single variable
freq(adsl,
  colvar = "colnbr",
  rowvar = "RACE",
  statlist = statlist(c("N", "n (x.x%)"))
)

# Unique subject count of a variable by another variable
freq(adsl,
  colvar = "colnbr",
  rowvar = "RACE",
  rowbyvar = "SEX",
  statlist = statlist("n")
)

# Unique subject count of a variable by another variable using colvar and
# group to define the denominator
freq(adsl,
  colvar = "colnbr",
  rowvar = "RACE",
  rowbyvar = "SEX",
  statlist = statlist("n (x.x%)", denoms_by = c("colnbr", "SEX"))
)

# Cut records where count meets threshold for any column
freq(cdisc_adsl,
  rowvar = "ETHNIC",
  colvar = "TRT01P",
  statlist = statlist("n (x.x%)"),
  cutoff = "5",
  cutoff_stat = "n"
)

# Cut records where count meets threshold for a specific column
freq(cdisc_adsl,
  rowvar = "ETHNIC",
  colvar = "TRT01P",
  statlist = statlist("n (x.x%)"),
  cutoff = "Placebo >= 3",
  cutoff_stat = "n"
)

# Below illustrates how to make the same calls to freq() as above, using
# table and column metadata.

# Unique subject count of a single variable
table_metadata <- tibble::tribble(
  ~anbr, ~func, ~df, ~rowvar, ~statlist, ~colvar,
  1, "freq", "cdisc_adsl", "ETHNIC", statlist("n"), "TRT01PN"
)

generate_results(table_metadata,
  column_metadata = column_metadata,
  tbltype = "type1"
)

# Unique subject count and percent of a single variable
table_metadata <- tibble::tribble(
  ~anbr, ~func, ~df, ~rowvar, ~statlist, ~colvar,
  "1", "freq", "cdisc_adsl", "ETHNIC", statlist(c("N", "n (x.x%)")), "TRT01PN"
)

generate_results(table_metadata,
  column_metadata = column_metadata,
  tbltype = "type1"
)

# Cut records where count meets threshold for any column
table_metadata <- tibble::tibble(
  anbr = "1", func = "freq", df = "cdisc_adsl", rowvar = "ETHNIC",
  statlist = statlist("n (x.x%)"), colvar = "TRT01PN", cutoff = 5,
  cutoff_stat = "n"
)

generate_results(table_metadata,
  column_metadata = column_metadata,
  tbltype = "type1"
)

# Cut records where count meets threshold for a specific column
table_metadata <- tibble::tibble(
  anbr = 1, func = "freq", df = "cdisc_adsl", rowvar = "ETHNIC",
  statlist = statlist("n (x.x%)"), colvar = "TRT01PN",
  cutoff = "col1 >= 3", cutoff_stat = "n"
)

generate_results(table_metadata,
  column_metadata = column_metadata,
  tbltype = "type1"
)

tidytlg documentation built on Dec. 19, 2025, 9:07 a.m.