frq: Frequency table of labelled variables
In strengejacke/sjmisc: Data and Variable Transformation Functions

View source: R/frq.R

frq	R Documentation

Frequency table of labelled variables

Description

This function returns a frequency table of labelled vectors, as data frame.

Usage

frq(
  x,
  ...,
  sort.frq = c("none", "asc", "desc"),
  weights = NULL,
  auto.grp = NULL,
  show.strings = TRUE,
  show.na = TRUE,
  grp.strings = NULL,
  min.frq = 0,
  out = c("txt", "viewer", "browser"),
  title = NULL,
  encoding = "UTF-8",
  file = NULL
)

Arguments

`x`	A vector or a data frame. May also be a grouped data frame (see 'Note' and 'Examples').
`...`	Optional, unquoted names of variables that should be selected for further processing. Required, if `x` is a data frame (and no vector) and only selected variables from `x` should be processed. You may also use functions like `:` or tidyselect's select-helpers. See 'Examples' or package-vignette.
`sort.frq`	Determines whether categories should be sorted according to their frequencies or not. Default is `"none"`, so categories are not sorted by frequency. Use `"asc"` or `"desc"` for sorting categories ascending or descending order.
`weights`	Bare name, or name as string, of a variable in `x` that indicates the vector of weights, which will be applied to weight all observations. Default is `NULL`, so no weights are used.
`auto.grp`	Numeric value, indicating the minimum amount of unique values in a variable, at which automatic grouping into smaller units is done (see `group_var`). Default value for `auto.group` is `NULL`, i.e. auto-grouping is off.
`show.strings`	Logical, if `TRUE`, frequency tables for character vectors will not be printed. This is useful when printing frequency tables of all variables from a data frame, and due to computational reasons character vectors should not be printed.
`show.na`	Logical, or `"auto"`. If `TRUE`, the output always contains information on missing values, even if variables have no missing values. If `FALSE`, information on missing values are removed from the output. If `show.na = "auto"`, information on missing values is only shown when variables actually have missing values, else it's not shown.
`grp.strings`	Numeric, if not `NULL`, groups string values in character vectors, based on their similarity. See `group_str` and `str_find` for details on grouping, and their `precision`-argument to get more details on the distance of strings to be treated as equal.
`min.frq`	Numeric, indicating the minimum frequency for which a value will be shown in the output (except for the missing values, prevailing `show.na`). Default value for `min.frq` is `0`, so all value frequencies are shown. All values or categories that have less than `min.frq` occurences in the data will be summarized in a `"n < 100"` category.
`out`	Character vector, indicating whether the results should be printed to console (`out = "txt"`) or as HTML-table in the viewer-pane (`out = "viewer"`) or browser (`out = "browser"`).
`title`	String, will be used as alternative title to the variable label. If `x` is a grouped data frame, `title` must be a vector of same length as groups.
`encoding`	Character vector, indicating the charset encoding used for variable and value labels. Default is `"UTF-8"`. Only used when `out` is not `"txt"`.
`file`	Destination file, if the output should be saved as file. Only used when `out` is not `"txt"`.

Details

The ...-argument not only accepts variable names or expressions from select-helpers. You can also use logical conditions, math operations, or combining variables to produce "crosstables". See 'Examples' for more details.

Value

A list of data frames with values, value labels, frequencies, raw, valid and cumulative percentages of x.

Note

x may also be a grouped data frame (see group_by) with up to two grouping variables. Frequency tables are created for each subgroup then.

The print()-method adds a table header with information on the variable label, variable type, total and valid N, and mean and standard deviations. Mean and SD are always printed, even for categorical variables (factors) or character vectors. In this case, values are coerced into numeric vector to calculate the summary statistics.

To print tables in markdown or HTML format, use print_md() or print_html().

Examples

# simple vector
data(efc)
frq(efc$e42dep)

# with grouped data frames, in a pipe
library(dplyr)
efc %>%
  group_by(e16sex, c172code) %>%
  frq(e42dep)

# show only categories with a minimal amount of frequencies
frq(mtcars$gear)

frq(mtcars$gear, min.frq = 10)

frq(mtcars$gear, min.frq = 15)

# with select-helpers: all variables from the COPE-Index
# (which all have a "cop" in their name)
frq(efc, contains("cop"))

# all variables from column "c161sex" to column "c175empl"
frq(efc, c161sex:c175empl)

# for non-labelled data, variable name is printed,
# and "label" column is removed from output
data(iris)
frq(iris, Species)

# also works on grouped data frames
efc %>%
  group_by(c172code) %>%
  frq(is.na(nur_pst))

# group variables with large range and with weights
efc$weights <- abs(rnorm(n = nrow(efc), mean = 1, sd = .5))
frq(efc, c160age, auto.grp = 5, weights = weights)

# different weight options
frq(efc, c172code, weights = weights)
frq(efc, c172code, weights = "weights")
frq(efc, c172code, weights = efc$weights)
frq(efc$c172code, weights = efc$weights)

# group string values
dummy <- efc[1:50, 3, drop = FALSE]
dummy$words <- sample(
  c("Hello", "Helo", "Hole", "Apple", "Ape",
    "New", "Old", "System", "Systemic"),
  size = nrow(dummy),
  replace = TRUE
)

frq(dummy)
frq(dummy, grp.strings = 2)

#### other expressions than variables

# logical conditions
frq(mtcars, cyl ==6)

frq(efc, is.na(nur_pst), contains("cop"))

iris %>%
  frq(starts_with("Petal"), Sepal.Length > 5)

# computation of variables "on the fly"
frq(mtcars, (gear + carb) / cyl)

# crosstables
set.seed(123)
d <- data.frame(
  var_x = sample(letters[1:3], size = 30, replace = TRUE),
  var_y = sample(1:2, size = 30, replace = TRUE),
  var_z = sample(LETTERS[8:10], size = 30, replace = TRUE)
)
table(d$var_x, d$var_z)
frq(d, paste0(var_x, var_z))
frq(d, paste0(var_x, var_y, var_z))

strengejacke/sjmisc documentation built on May 16, 2024, 4:07 a.m.