crosstable_statistics: Measures of association for contingency tables
In sjstats: Collection of Convenient Functions for Common Statistical Computations

cramers_v

R Documentation

Measures of association for contingency tables

Description

This function calculates various measure of association for contingency tables and returns the statistic and p-value. Supported measures are Cramer's V, Phi, Spearman's rho, Kendall's tau and Pearson's r.

Usage

cramers_v(tab, ...)

cramer(tab, ...)

## S3 method for class 'formula'
cramers_v(
  formula,
  data,
  ci.lvl = NULL,
  n = 1000,
  method = c("dist", "quantile"),
  ...
)

phi(tab, ...)

crosstable_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

xtab_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

Arguments

`tab`	A `table()` or `ftable()`. Tables of class `xtabs()` and other will be coerced to `ftable` objects.
`...`	Other arguments, passed down to the statistic functions `chisq.test()`, `fisher.test()` or `cor.test()`.
`formula`	A formula of the form `lhs ~ rhs` where `lhs` is a numeric variable giving the data values and `rhs` a factor giving the corresponding groups.
`data`	A data frame or a table object. If a table object, `x1` and `x2` will be ignored. For Kendall's tau, Spearman's rho or Pearson's product moment correlation coefficient, `data` needs to be a data frame. If `x1` and `x2` are not specified, the first two columns of the data frames are used as variables to compute the crosstab.
`ci.lvl`	Scalar between 0 and 1. If not `NULL`, returns a data frame including lower and upper confidence intervals.
`n`	Number of bootstraps to be generated.
`method`	Character vector, indicating if confidence intervals should be based on bootstrap standard error, multiplied by the value of the quantile function of the t-distribution (default), or on sample quantiles of the bootstrapped values. See 'Details' in `boot_ci()`. May be abbreviated.
`x1`	Name of first variable that should be used to compute the contingency table. If `data` is a table object, this argument will be irgnored.
`x2`	Name of second variable that should be used to compute the contingency table. If `data` is a table object, this argument will be irgnored.
`statistics`	Name of measure of association that should be computed. May be one of `"auto"`, `"cramer"`, `"phi"`, `"spearman"`, `"kendall"`, `"pearson"` or `"fisher"`. See 'Details'.
`weights`	Name of variable in `x` that indicated the vector of weights that will be applied to weight all observations. Default is `NULL`, so no weights are used.

Details

The p-value for Cramer's V and the Phi coefficient are based on chisq.test(). If any expected value of a table cell is smaller than 5, or smaller than 10 and the df is 1, then fisher.test() is used to compute the p-value, unless statistics = "fisher"; in this case, the use of fisher.test() is forced to compute the p-value. The test statistic is calculated with cramers_v() resp. phi().

Both test statistic and p-value for Spearman's rho, Kendall's tau and Pearson's r are calculated with cor.test().

When statistics = "auto", only Cramer's V or Phi are calculated, based on the dimension of the table (i.e. if the table has more than two rows or columns, Cramer's V is calculated, else Phi).

Value

For phi(), the table's Phi value. For [⁠cramers_v()]⁠, the table's Cramer's V.

For crosstable_statistics(), a list with following components:

estimate: the value of the estimated measure of association.
p.value: the p-value for the test.
statistic: the value of the test statistic.
stat.name: the name of the test statistic.
stat.html: if applicable, the name of the test statistic, in HTML-format.
df: the degrees of freedom for the contingency table.
method: character string indicating the name of the measure of association.
method.html: if applicable, the name of the measure of association, in HTML-format.
method.short: the short form of association measure, equals the statistics-argument.
fisher: logical, if Fisher's exact test was used to calculate the p-value.

References

Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/math11091982")}

Examples

# Phi coefficient for 2x2 tables
tab <- table(sample(1:2, 30, TRUE), sample(1:2, 30, TRUE))
phi(tab)

# Cramer's V for nominal variables with more than 2 categories
tab <- table(sample(1:2, 30, TRUE), sample(1:3, 30, TRUE))
cramer(tab)

# formula notation
data(efc)
cramer(e16sex ~ c161sex, data = efc)

# bootstrapped confidence intervals
cramer(e16sex ~ c161sex, data = efc, ci.lvl = .95, n = 100)

# 2x2 table, compute Phi automatically
crosstable_statistics(efc, e16sex, c161sex)

# more dimensions than 2x2, compute Cramer's V automatically
crosstable_statistics(efc, c172code, c161sex)

# ordinal data, use Kendall's tau
crosstable_statistics(efc, e42dep, quol_5, statistics = "kendall")

# calcilate Spearman's rho, with continuity correction
crosstable_statistics(efc,
  e42dep,
  quol_5,
  statistics = "spearman",
  exact = FALSE,
  continuity = TRUE
)

sjstats documentation built on May 29, 2024, 12:09 p.m.