Effect Sizes for Contingency Tables"

options(knitr.kable.NA = "")
knitr::opts_chunk$set(comment = ">")
options(digits = 3)


.eval_if_requireNamespace <- function(...) {
  pkgs <- c(...)
  knitr::opts_chunk$get("eval") &&
    all(vapply(pkgs, requireNamespace, TRUE, quietly = TRUE))

This vignette provides a review of effect sizes for 1- and 2-D contingency tables, which are typically analysed with chisq.test() and fisher.test().

options(es.use_symbols = TRUE) # get nice symbols when printing! (On Windows, requires R >= 4.2.0)

Nominal Correlation

2-by-2 Tables

For 2-by-2 contingency tables, $\phi$ (Phi) is homologous (though directionless) to the biserial correlation between two dichotomous variables, with 0 representing no association, and 1 representing a perfect association.

(MPG_Gear <- table(mtcars$mpg < 20, mtcars$vs))

phi(MPG_Gear, adjust = FALSE)

# Same as:
cor(mtcars$mpg < 20, mtcars$vs)

A "cousin" effect size is Pearson's contingency coefficient, however it is not a true measure of correlation, but rather a type of normalized $\chi^2$ (see chisq_to_pearsons_c()):


Larger Tables

For larger contingency tables, Cramér's V, Tschuprow's T, Cohen's w, and Pearson's C can be used.

Both Cramér's V and Tschuprow's T are true measures of correlation: they range from 0 to 1, with 0 indicating complete independence between the nominal variables, and 1 indicating complete dependence. For square tables, they are equal, however for non-square tables $T < V$; While Cramér's V defines complete dependence as "it is possible to know exactly the value of the columns from the rows or know exactly the value of the rows from the columns", Tschuprow's T required both to be true to achieve complete dependence - something that is not possible for non-square tables. For example:


In this case, if you know the food product, you know if it is vegan or not, but knowing if the food is vegan or not will not always let you know what food product it is.

cramers_v(food_class, adjust = FALSE)

tschuprows_t(food_class, adjust = FALSE)

Cohen's w and Pearson's C are not true measures of correlation, but are two types of normalized $\chi^2$ values. While Pearson's C is capped at 1, Cohen's w can be larger than 1 (for both, 0 indicates no association between the variables).






cohens_w(Music_preferences2) # > 1

(Cramer's V, Tschuprow's T, and Cohen's w can also be used for 2-by-2 tables, but there they are equivalent to $\phi$.)

For a Bayesian $\chi^2$-test

A Bayesian estimate of these effect sizes can also be provided based on BayesFactor's version of a $\chi^2$-test via the effectsize() function:

BFX <- contingencyTableBF(MPG_Gear, sampleType = "jointMulti")

effectsize(BFX, type = "phi") # for 2 * 2

BFX <- contingencyTableBF(Music_preferences2, sampleType = "jointMulti")

effectsize(BFX, type = "cramers_v")

effectsize(BFX, type = "tschuprows_t")

effectsize(BFX, type = "cohens_w")

effectsize(BFX, type = "pearsons_c")

Goodness of Fit

Cohen's w and Pearson's C are also applicable to tests of goodness-of-fit, where they indicate the degree of deviation from the hypothetical probabilities, with 0 reflecting no deviation.

O <- c(89, 37, 130, 28, 2) # observed group sizes
E <- c(.40, .20, .20, .15, .05) # expected group freq

chisq.test(O, p = E)

pearsons_c(O, p = E)

cohens_w(O, p = E)

However, these values are not (anti)correlations - they do not properly adjust for the "shape" of the expected multinational distribution, making them inflated (additionally, Cohen's w can be larger than 1, making it harder to interpret).

For these reasons, we recommend the $פ$ (Fei) coefficient, which is a measure of anti-correlation between the observed and the expected distributions, ranging from 0 (observed distribution matches the expected distribution perfectly) and 1 (the observed distribution is maximally different than the expected one).

fei(O, p = E)

# Observed perfectly matches Expected
(O1 <- E * 286)

fei(O1, p = E)

# Observed deviates maximally from Expected:
# All observed values are in the least expected class!
(O2 <- c(rep(0, 4), 286))

fei(O2, p = E)

Differences in Proportions

For 2-by-2 tables, we can also compute the Odds-ratio (OR), where each column represents a different group. Values larger than 1 indicate that the odds are higher in the first group (and vice versa).


chisq.test(RCT_table) # or fisher.test(RCT_table)


We can also compute the Risk-ratio (RR), which is the ratio between the proportions of the two groups, and the Absolute Risk Reduction (ARR), which is the difference between the proportions of the two groups - both are measures which some claim to be more intuitive.



Additionally, Cohen's h can also be computed, which uses the arcsin transformation. Negative values indicate smaller proportion in the first group (and vice versa).


For a Bayesian $\chi^2$-test

A Bayesian estimate of these effect sizes can also be provided based on BayesFactor's version of a $\chi^2$-test via the effectsize() function:

BFX <- contingencyTableBF(RCT_table, sampleType = "jointMulti")

effectsize(BFX, type = "or")

effectsize(BFX, type = "rr")

effectsize(BFX, type = "cohens_h")

Paired Contingency Tables

For dependent (paired) contingency tables, Cohen's g represents the symmetry of the table, ranging between 0 (perfect symmetry) and 0.5 (perfect asymmetry).

For example, these two tests seem to be equally predictive of the disease they are screening:


phi(screening_test$Diagnosis, screening_test$Test1)

phi(screening_test$Diagnosis, screening_test$Test2)

Does this mean they give the same number of positive/negative results?

tests <- table(Test1 = screening_test$Test1, Test2 = screening_test$Test2)



Test 1 gives more positive results than test 2!


Try the effectsize package in your browser

Any scripts or data that you put into this service are public.

effectsize documentation built on Sept. 14, 2023, 5:07 p.m.