freqs: Run frequencies for multiple variables

View source: R/freqs.R

freqsR Documentation

Run frequencies for multiple variables

Description

Run frequencies for multiple variables

Usage

freqs(
  dataset,
  ...,
  stat = c("percent", "mean", "median", "min", "max", "quantile", "summary"),
  percentile = NULL,
  nas = TRUE,
  wt = NULL,
  prompt = FALSE,
  digits = 2,
  nas_group = TRUE,
  factor_group = FALSE,
  unweighted_ns = FALSE,
  show_missing_levels = TRUE
)

freq(
  dataset,
  ...,
  stat = c("percent", "mean", "median", "min", "max", "quantile", "summary"),
  percentile = NULL,
  nas = TRUE,
  wt = NULL,
  prompt = FALSE,
  digits = 2,
  nas_group = TRUE,
  factor_group = FALSE,
  unweighted_ns = FALSE,
  show_missing_levels = TRUE
)

Arguments

dataset

A dataframe.

...

The unquoted names of a set of variables in the dataset. If nothing is specified, the function runs a frequency on every column in given dataset.

stat

Character, stat to run. Currently accepts 'percent,' 'mean,' 'median,' 'min,' 'max,' 'quantile,' and 'summary' (default: 'percent').

percentile

Double, for use when stat = 'quantile.' Input should be a real number x such that 0 <= x <= 100. Stands for percentile rank, which is a quantile relative to a 100-point scale. (default:NULL)

nas

Boolean, whether or not to include NAs in the tabulation (default: TRUE).

wt

The unquoted name of a weighting variable in the dataset (default: NULL).

prompt

Boolean, whether or not to include the prompt in the dataset (default: FALSE).

digits

Integer, number of significant digits for rounding (default: 2).

nas_group

Boolean, whether or not to include NA values for the grouping variable in the tabulation (default: TRUE).

factor_group

Boolean, whether or not to convert the grouping variable to a factor and use its labels instead of its underlying numeric values (default: FALSE)

unweighted_ns

Boolean, whether the 'n' column in the freqs table should be UNweighted while results ARE weighted. This argument can only be used if a wt variable is used. If no weight variable is used, the 'n' column will always be unweighted (default: FALSE).

show_missing_levels

Boolean, whether to keep response levels with no data (default: TRUE)

Value

A dataframe with the variable names, prompts, values, labels, counts, stats, and resulting calculations.

Examples

df <- data.frame(
  a = c(1, 2, 2, 3, 4, 2, NA),
  b = c(1, 2, 2, 3, 4, 1, NA),
  weights = c(0.9, 0.9, 1.1, 1.1, 1, 1, 1)
)

freqs(df, a, b)
freqs(df, a, b, wt = weights)
freq(df, stat = 'mean', nas = FALSE)
freq(df, stat = 'mean', nas = FALSE, wt = weights)
df %>%
  dplyr::group_by(a) %>%
  freqs(b, nas = FALSE, wt = weights)

# Note that percentile = 60 will return an estimate
# of the real number such that 60% of values
# are lower than that number

# * note also that minimums and maximums are
# unaffected by weighting
freqs(df, a, stat = 'min', nas = FALSE)
freqs(df, a, stat = 'median', nas = FALSE)
freqs(df, a, stat = 'quantile', percentile = 95, nas = FALSE)
freqs(df, a, stat = 'summary', nas = FALSE, wt = weights)

y2analytics/y2clerk documentation built on Feb. 28, 2025, 5:47 p.m.