crosstable_statistics: Measures of association for contingency tables

View source: R/xtab_statistics.R

cramers_vR Documentation

Measures of association for contingency tables

Description

This function calculates various measure of association for contingency tables and returns the statistic and p-value. Supported measures are Cramer's V, Phi, Spearman's rho, Kendall's tau and Pearson's r.

Usage

cramers_v(tab, ...)

cramer(tab, ...)

## S3 method for class 'formula'
cramers_v(
  formula,
  data,
  ci.lvl = NULL,
  n = 1000,
  method = c("dist", "quantile"),
  ...
)

phi(tab, ...)

crosstable_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

xtab_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

Arguments

tab

A table() or ftable(). Tables of class xtabs() and other will be coerced to ftable objects.

...

Other arguments, passed down to the statistic functions chisq.test(), fisher.test() or cor.test().

formula

A formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor giving the corresponding groups.

data

A data frame or a table object. If a table object, x1 and x2 will be ignored. For Kendall's tau, Spearman's rho or Pearson's product moment correlation coefficient, data needs to be a data frame. If x1 and x2 are not specified, the first two columns of the data frames are used as variables to compute the crosstab.

ci.lvl

Scalar between 0 and 1. If not NULL, returns a data frame including lower and upper confidence intervals.

n

Number of bootstraps to be generated.

method

Character vector, indicating if confidence intervals should be based on bootstrap standard error, multiplied by the value of the quantile function of the t-distribution (default), or on sample quantiles of the bootstrapped values. See 'Details' in boot_ci(). May be abbreviated.

x1

Name of first variable that should be used to compute the contingency table. If data is a table object, this argument will be irgnored.

x2

Name of second variable that should be used to compute the contingency table. If data is a table object, this argument will be irgnored.

statistics

Name of measure of association that should be computed. May be one of "auto", "cramer", "phi", "spearman", "kendall", "pearson" or "fisher". See 'Details'.

weights

Name of variable in x that indicated the vector of weights that will be applied to weight all observations. Default is NULL, so no weights are used.

Details

The p-value for Cramer's V and the Phi coefficient are based on chisq.test(). If any expected value of a table cell is smaller than 5, or smaller than 10 and the df is 1, then fisher.test() is used to compute the p-value, unless statistics = "fisher"; in this case, the use of fisher.test() is forced to compute the p-value. The test statistic is calculated with cramers_v() resp. phi().

Both test statistic and p-value for Spearman's rho, Kendall's tau and Pearson's r are calculated with cor.test().

When statistics = "auto", only Cramer's V or Phi are calculated, based on the dimension of the table (i.e. if the table has more than two rows or columns, Cramer's V is calculated, else Phi).

Value

For phi(), the table's Phi value. For [⁠cramers_v()]⁠, the table's Cramer's V.

For crosstable_statistics(), a list with following components:

  • estimate: the value of the estimated measure of association.

  • p.value: the p-value for the test.

  • statistic: the value of the test statistic.

  • stat.name: the name of the test statistic.

  • stat.html: if applicable, the name of the test statistic, in HTML-format.

  • df: the degrees of freedom for the contingency table.

  • method: character string indicating the name of the measure of association.

  • method.html: if applicable, the name of the measure of association, in HTML-format.

  • method.short: the short form of association measure, equals the statistics-argument.

  • fisher: logical, if Fisher's exact test was used to calculate the p-value.

References

Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/math11091982")}

Examples

# Phi coefficient for 2x2 tables
tab <- table(sample(1:2, 30, TRUE), sample(1:2, 30, TRUE))
phi(tab)

# Cramer's V for nominal variables with more than 2 categories
tab <- table(sample(1:2, 30, TRUE), sample(1:3, 30, TRUE))
cramer(tab)

# formula notation
data(efc)
cramer(e16sex ~ c161sex, data = efc)

# bootstrapped confidence intervals
cramer(e16sex ~ c161sex, data = efc, ci.lvl = .95, n = 100)

# 2x2 table, compute Phi automatically
crosstable_statistics(efc, e16sex, c161sex)

# more dimensions than 2x2, compute Cramer's V automatically
crosstable_statistics(efc, c172code, c161sex)

# ordinal data, use Kendall's tau
crosstable_statistics(efc, e42dep, quol_5, statistics = "kendall")

# calcilate Spearman's rho, with continuity correction
crosstable_statistics(efc,
  e42dep,
  quol_5,
  statistics = "spearman",
  exact = FALSE,
  continuity = TRUE
)

sjstats documentation built on May 29, 2024, 12:09 p.m.