freq | R Documentation |
Create a frequency table of a vector
or a data.frame
. It supports tidyverse's quasiquotation and RMarkdown for reports. Easiest practice is: data %>% freq(var)
using the tidyverse.
top_freq
can be used to get the top/bottom n items of a frequency table, with counts as names. It respects ties.
freq(x, ...) ## Default S3 method: freq( x, sort.count = TRUE, nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE, markdown = !interactive(), digits = 2, quote = NULL, header = TRUE, title = NULL, na = "<NA>", sep = " ", decimal.mark = getOption("OutDec"), big.mark = "", wt = NULL, ... ) ## S3 method for class 'factor' freq(x, ..., droplevels = FALSE) ## S3 method for class 'matrix' freq(x, ..., quote = FALSE) ## S3 method for class 'table' freq(x, ..., sep = " ") ## S3 method for class 'numeric' freq(x, ..., digits = 2) ## S3 method for class 'Date' freq(x, ..., format = "yyyy-mm-dd") ## S3 method for class 'hms' freq(x, ..., format = "HH:MM:SS") is.freq(f) top_freq(f, n) header(f, property = NULL) ## S3 method for class 'freq' print( x, nmax = getOption("max.print.freq", default = 10), markdown = !interactive(), header = TRUE, decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark != ",", ",", "."), ... )
x |
vector of any class or a |
... |
up to nine different columns of |
sort.count |
sort on count, i.e. frequencies. This will be |
nmax |
number of row to print. The default, |
na.rm |
a logical value indicating whether |
row.names |
a logical value indicating whether row indices should be printed as |
markdown |
a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when |
digits |
how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on |
quote |
a logical value indicating whether or not strings should be printed with surrounding quotes. Default is to print them only around characters that are actually numeric values. |
header |
a logical value indicating whether an informative header should be printed |
title |
text to show above frequency table, at default to tries to coerce from the variables passed to |
na |
a character string that should be used to show empty ( |
sep |
a character string to separate the terms when selecting multiple columns |
decimal.mark |
the character to be used to indicate the numeric decimal point |
big.mark |
character; if not empty used as mark between every 'big.interval' decimals before (hence big) the decimal point |
wt |
frequency weights. If a variable, computes |
droplevels |
a logical value indicating whether in factors empty levels should be dropped |
format |
a character to define the printing format (it supports |
f |
a frequency table |
n |
number of top n items to return, use -n for the bottom n items. It will include more than |
property |
property in header to return this value directly |
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the 'freq' function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution.
Input can be done in many different ways. Base R methods are:
freq(df$variable) freq(df[, "variable"])
Tidyverse methods are:
df$variable %>% freq() df[, "variable"] %>% freq() df %>% freq("variable") df %>% freq(variable)
For numeric values of any class, these additional values will all be calculated with na.rm = TRUE
and shown into the header:
Mean, using mean
Standard Deviation, using sd
Coefficient of Variation (CV), the standard deviation divided by the mean
Mean Absolute Deviation (MAD), using mad
Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), see NOTE below
Interquartile Range (IQR) calculated as Q3 - Q1
, see NOTE below
Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion) calculated as (Q3 - Q1) / (Q3 + Q1)
, see NOTE below
Outliers (total count and percentage), using boxplot.stats
NOTE: These values are calculated using the same algorithm as used by Minitab and SPSS: p[k] = E[F(x[k])]. See Type 6 on the quantile
page.
For dates and times of any class, these additional values will be calculated with na.rm = TRUE
and shown into the header:
Oldest, using min
Newest, using max
, with difference between newest and oldest
In factors, all factor levels that are not existing in the input data will be dropped at default.
The function top_freq
will include more than n
rows if there are ties. Use a negative number for n (like n = -3
) to select the bottom n values.
A data.frame
(with an additional class "freq"
) with five columns: item
, count
, percent
, cum_count
and cum_percent
.
freq()
functionInterested in extending the freq()
function with your own class? Add a method like below to your package, and optionally define some header info by passing a list
to the .add_header
parameter, like below example for class difftime
. This example assumes that you use the roxygen2
package for package development.
#' @method freq difftime #' @importFrom cleaner freq.default #' @export #' @noRd freq.difftime <- function(x, ...) { freq.default(x = x, ..., .add_header = list(units = attributes(x)$units)) }
Be sure to call freq.default
in your function and not just freq
. Also, add cleaner
to the Imports:
field of your DESCRIPTION
file, to make sure that it will be installed with your package, e.g.:
Imports: cleaner
freq(unclean$gender, markdown = FALSE) freq(x = clean_factor(unclean$gender, levels = c("^m" = "Male", "^f" = "Female")), markdown = TRUE, title = "Frequencies of a cleaned version for a markdown report!", header = FALSE, quote = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.