freq_table: Estimate Counts, Percentages, and Confidence Intervals in...

View source: R/freq_table.R

freq_tableR Documentation

Estimate Counts, Percentages, and Confidence Intervals in dplyr Pipelines

Description

The freq_table function produces one-way and two-way frequency tables for categorical variables. In addition to frequencies, the freq_table function displays percentages, and the standard errors and confidence intervals of the percentages. For two-way tables only, freq_table also displays row (subgroup) percentages, standard errors, and confidence intervals.

freq_table is intended to be used in a dplyr pipeline.

All standard errors are calculated as some version of: sqrt(proportion * (1 - proportion) / (n - 1))

For one-way tables, the default 95 percent confidence intervals displayed are logit transformed confidence intervals equivalent to those used by Stata. Additionally, freq_table will return Wald ("linear") confidence intervals if the argument to ci_type = "wald".

For two-way tables, freq_table returns logit transformed confidence intervals equivalent to those used by Stata.

Usage

freq_table(.data, ..., percent_ci = 95, ci_type = "logit", drop = FALSE)

Arguments

.data

A data frame. If it is already grouped (i.e., class == "grouped_df") then freq_table will ungroup it to prevent unexpected results.

For two-way tables, the count for each level of the variable in the first argument to freq_table will be the denominator for row percentages and their confidence intervals. Said another way, the goal of the analysis is to compare percentages of some characteristic across two or more groups of interest, then the variable in the first argument to freq_table should contain the groups of interest, and the variable in the second argument to freq_table should contain the characteristic of interest.

...

Categorical variables to be used in calculations. Currently, freq_table accepts one or two variables – not more.

By default, if ... includes a factor variable with a level (category) that is unobserved in the data, that level will still appear in the results with a count (n) equal to zero. This behavior can be changed using the drop parameter (see below). When n = 0, the confidence intervals will be NaN.

percent_ci

sets the level, as a percentage, for confidence intervals. The default is percent_ci = 95 for 95 percentage value entered (e.g., 95) is converted to an alpha level as 1 - (percent_ci / 100). It is then converted to a two-sided probability as (1 - alpha / 2), which is used to calculate a critical value from Student's t distribution with n - 1 degrees of freedom.

ci_type

Selects the method used to estimate 95 percent confidence intervals. The default for one-way and two-way tables is logit transformed ("log"). For one-way tables only, ci_type can optionally calculate Wald ("linear") confidence intervals using the "wald" argument.

drop

If false (default) unobserved factor levels will be included in the returned frequency table with an n of 0. For example, if you have a factor variable, gender, but no males in your data then frequency table returned by freq_table(df, gender) would still contain a row for males with the variable n = 0. If drop is set to TRUE, then the resulting frequency table would not include a row for males at all.

Value

A tibble with class "freq_table_one_way" or "freq_table_two_way"

References

Agresti, A. (2012). Categorical Data Analysis (3rd ed.). Hoboken, NJ: Wiley.

SAS confidence limits for proportions documentation

Stata confidence limits for proportions documentation

Examples

library(dplyr)
library(freqtables)

data(mtcars)

# --------------------------------------------------------------------------
# One-way frequency table with defaults
#   - The default confidence intervals are logit transformed - matching the
#     method used by Stata
# --------------------------------------------------------------------------
mtcars %>%
  freq_table(am)

#   A tibble: 2 x 9
#   var   cat       n n_total percent    se t_crit   lcl   ucl
#   <chr> <chr> <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>
# 1 am    0        19      32    59.4  8.82   2.04  40.9  75.5
# 2 am    1        13      32    40.6  8.82   2.04  24.5  59.1


# --------------------------------------------------------------------------
# One-way frequency table with arbitrary cconfidence intervals
#   - The default behavior of freq_table is to return 95% confidence
#     intervals (two-sided). However, this behavior can be adjusted to return
#     any alpha level. For example, to return 99% confidence intervals just
#     pass 99 to the percent_ci parameter of freq_table as demonstrated below.
# --------------------------------------------------------------------------
mtcars %>%
  freq_table(am, percent_ci = 99)

#   A tibble: 2 x 9
#   var   cat       n n_total percent    se t_crit   lcl   ucl
#   <chr> <chr> <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>
# 1 am    0        19      32    59.4  8.82   2.74  34.9  79.9
# 2 am    1        13      32    40.6  8.82   2.74  20.1  65.1


# --------------------------------------------------------------------------
# One-way frequency table with Wald confidence intervals
# Optionally, the ci_type = "wald" argument can be used to calculate Wald
# confidence intervals that match those returned by SAS.
# --------------------------------------------------------------------------
mtcars %>%
  freq_table(am, ci_type = "wald")

#   A tibble: 2 x 9
#   var   cat       n n_total percent    se t_crit   lcl   ucl
#   <chr> <chr> <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>
# 1 am    0        19      32    59.4  8.82   2.04  41.4  77.4
# 2 am    1        13      32    40.6  8.82   2.04  22.6  58.6


# --------------------------------------------------------------------------
# One-way frequency table with drop = FALSE (default)
# --------------------------------------------------------------------------
df <- data.frame(
  id = c(1, 2, 3, 4),
  gender = factor(
    # All females
    c(1, 1, 1, 1),
    levels = c(1, 2),
    labels = c("female", "male"))
)

df %>%
  freq_table(gender)

#   A tibble: 2 x 9
#   var    cat        n n_total percent    se t_crit   lcl   ucl
#   <chr>  <chr>  <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>
# 1 gender female     4       4     100     0   3.18   NaN   NaN
# 2 gender male       0       4       0     0   3.18   NaN   NaN


# --------------------------------------------------------------------------
# One-way frequency table with drop = TRUE
# --------------------------------------------------------------------------
df <- data.frame(
  id = factor(rep(1:3, each = 4)),
  period = factor(rep(1:4)),
  x = factor(c(0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1))
)

# Now, supppose we want to drop period 3 & 4 from our analysis.
# By default, this will give us 0s for period 3 & 4, but we want to drop them.

df <- df %>%
  filter(period %in% c(1, 2))

df %>%
  freq_table(period)

#   A tibble: 4 x 9
#   var    cat       n n_total percent    se t_crit    lcl   ucl
#   <chr>  <chr> <int>   <int>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
# 1 period 1         3       6      50  22.4   2.57   9.12  90.9
# 2 period 2         3       6      50  22.4   2.57   9.12  90.9
# 3 period 3         0       6       0   0     2.57 NaN    NaN
# 4 period 4         0       6       0   0     2.57 NaN    NaN

# But, we don't want period 3 & 4 in our frequency table at all. That's
# when we should change drop to TRUE.

df %>%
  freq_table(period, drop = TRUE)

#   A tibble: 4 x 9
#   var    cat       n n_total percent    se t_crit    lcl   ucl
#   <chr>  <chr> <int>   <int>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
# 1 period 1         3       6      50  22.4   2.57   9.12  90.9
# 2 period 2         3       6      50  22.4   2.57   9.12  90.9


# --------------------------------------------------------------------------
# Two-way frequency table with defaults
# Output truncated to fit the screen
# --------------------------------------------------------------------------
mtcars %>%
  freq_table(am, cyl)

#   A tibble: 6 x 17
#   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
# 1 am      0       cyl     4           3    19      32          9.38     5.24
# 2 am      0       cyl     6           4    19      32         12.5      5.94
# 3 am      0       cyl     8          12    19      32         37.5      8.70
# 4 am      1       cyl     4           8    13      32         25        7.78
# 5 am      1       cyl     6           3    13      32          9.38     5.24
# 6 am      1       cyl     8           2    13      32          6.25     4.35

freqtables documentation built on April 3, 2022, 5:11 p.m.