freq: Frequency table

Description Usage Arguments Details Value Extending the freq() function Examples

View source: R/freq.R

Description

Create a frequency table of a vector or a data.frame. It supports tidyverse's quasiquotation and RMarkdown for reports. Easiest practice is: data %>% freq(var) using the tidyverse.

top_freq can be used to get the top/bottom n items of a frequency table, with counts as names. It respects ties.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
freq(x, ...)

## Default S3 method:
freq(
  x,
  sort.count = TRUE,
  nmax = getOption("max.print.freq"),
  na.rm = TRUE,
  row.names = TRUE,
  markdown = !interactive(),
  digits = 2,
  quote = NULL,
  header = TRUE,
  title = NULL,
  na = "<NA>",
  sep = " ",
  decimal.mark = getOption("OutDec"),
  big.mark = "",
  wt = NULL,
  ...
)

## S3 method for class 'factor'
freq(x, ..., droplevels = FALSE)

## S3 method for class 'matrix'
freq(x, ..., quote = FALSE)

## S3 method for class 'table'
freq(x, ..., sep = " ")

## S3 method for class 'numeric'
freq(x, ..., digits = 2)

## S3 method for class 'Date'
freq(x, ..., format = "yyyy-mm-dd")

## S3 method for class 'hms'
freq(x, ..., format = "HH:MM:SS")

is.freq(f)

top_freq(f, n)

header(f, property = NULL)

## S3 method for class 'freq'
print(
  x,
  nmax = getOption("max.print.freq", default = 10),
  markdown = !interactive(),
  header = TRUE,
  decimal.mark = getOption("OutDec"),
  big.mark = ifelse(decimal.mark != ",", ",", "."),
  ...
)

Arguments

x

vector of any class or a data.frame or table

...

up to nine different columns of x when x is a data.frame or tibble, to calculate frequencies from - see Examples. Also supports quasiquotion.

sort.count

sort on count, i.e. frequencies. This will be TRUE at default for everything except when using grouping variables.

nmax

number of row to print. The default, 10, uses getOption("max.print.freq"). Use nmax = 0, nmax = Inf, nmax = NULL or nmax = NA to print all rows.

na.rm

a logical value indicating whether NA values should be removed from the frequency table. The header (if set) will always print the amount of NAs.

row.names

a logical value indicating whether row indices should be printed as 1:nrow(x)

markdown

a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when nmax is defined) and is default behaviour in non-interactive R sessions (like when knitting RMarkdown files).

digits

how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on getOption("digits"))

quote

a logical value indicating whether or not strings should be printed with surrounding quotes. Default is to print them only around characters that are actually numeric values.

header

a logical value indicating whether an informative header should be printed

title

text to show above frequency table, at default to tries to coerce from the variables passed to x

na

a character string that should be used to show empty (NA) values (only useful when na.rm = FALSE)

sep

a character string to separate the terms when selecting multiple columns

decimal.mark

used for prettying (longish) numerical and complex sequences. Passed to prettyNum: that help page explains the details.

big.mark

used for prettying (longish) numerical and complex sequences. Passed to prettyNum: that help page explains the details.

wt

frequency weights. If a variable, computes sum(wt) instead of counting the rows.

droplevels

a logical value indicating whether in factors empty levels should be dropped

format

a character to define the printing format (it supports format_datetime to transform e.g. "d mmmm yyyy" to "%e %B %Y")

f

a frequency table

n

number of top n items to return, use -n for the bottom n items. It will include more than n rows if there are ties.

property

property in header to return this value directly

Details

Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the 'freq' function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution.

Input can be done in many different ways. Base R methods are:

1
2
freq(df$variable)
freq(df[, "variable"])

Tidyverse methods are:

1
2
3
4
df$variable %>% freq()
df[, "variable"] %>% freq()
df %>% freq("variable")
df %>% freq(variable)

For numeric values of any class, these additional values will all be calculated with na.rm = TRUE and shown into the header:

NOTE: These values are calculated using the same algorithm as used by Minitab and SPSS: p[k] = E[F(x[k])]. See Type 6 on the quantile page.

For dates and times of any class, these additional values will be calculated with na.rm = TRUE and shown into the header:

In factors, all factor levels that are not existing in the input data will be dropped at default.

The function top_freq will include more than n rows if there are ties. Use a negative number for n (like n = -3) to select the bottom n values.

Value

A data.frame (with an additional class "freq") with five columns: item, count, percent, cum_count and cum_percent.

Extending the freq() function

Interested in extending the freq() function with your own class? Add a method like below to your package, and optionally define some header info by passing a list to the .add_header parameter, like below example for class difftime. This example assumes that you use the roxygen2 package for package development.

1
2
3
4
5
6
7
8
#' @method freq difftime
#' @importFrom cleaner freq.default
#' @export
#' @noRd
freq.difftime <- function(x, ...) {
  freq.default(x = x, ...,
               .add_header = list(units = attributes(x)$units))
}

Be sure to call freq.default in your function and not just freq. Also, add cleaner to the Imports: field of your DESCRIPTION file, to make sure that it will be installed with your package, e.g.:

1
Imports: cleaner

Examples

1
2
3
4
5
6
7
8
9
freq(unclean$gender, markdown = FALSE)

freq(x = clean_factor(unclean$gender, 
                      levels = c("^m" = "Male", 
                                 "^f" = "Female")),
     markdown = TRUE,
     title = "Frequencies of a cleaned version for a markdown report!",
     header = FALSE,
     quote = TRUE)

Example output

Frequency table 

Class:      character
Length:     500
Available:  500 (100%, NA: 0 = 0%)
Unique:     5

Shortest:   1
Longest:    6

     Item      Count   Percent   Cum. Count   Cum. Percent
---  -------  ------  --------  -----------  -------------
1    male        240     48.0%          240          48.0%
2    female      220     44.0%          460          92.0%
3    man          22      4.4%          482          96.4%
4    m            15      3.0%          497          99.4%
5    F             3      0.6%          500         100.0%



**Frequencies of a cleaned version for a markdown report!**   




|   |Item     | Count| Percent| Cum. Count| Cum. Percent|
|:--|:--------|-----:|-------:|----------:|------------:|
|1  |"Male"   |   277|   55.4%|        277|        55.4%|
|2  |"Female" |   223|   44.6%|        500|       100.0%|

cleaner documentation built on June 13, 2021, 5:06 p.m.