Introduction to the Cochran-Mantel-Haenszel Test"

Introduction

The Cochran-Mantel-Haenszel test (CMH) is an inferential test for the association between two binary variables, while controlling for a third confounding nominal variable [@Cochran1954; @Mantel1959]. Essentially, the CMH test examines the weighted association of a set of 2 $\times$ 2 tables. A common odds ratio relating to the test statistic can also be generated [@Mantel1959]. The CMH test is a common technique in the field of biostatistics, where it is often used for case-control studies.

This introduction briefly describes some of the terminology and concepts surrounding stratified tables. Examples are given which show some basic techniques for working with multidimensional tables in R. Functionality of the samplesizeCMH package is highlighted where it may augment the analysis.

Partial and Marginal Tables

Consider a contingency table comparing $X$ and $Y$ at some fixed level of $Z$. The cross-section of the three-way table examining only one level of $Z$ is called a partial table. On the other hand, the combined counts of $X$ and $Y$ across all levels of $Z$, id est a simple two-way contingency table ignoring $Z$, produce the marginal table. These concepts are described in depth by @Agresti [section 2.7.1].

Example

We will use the Titanic{.r} dataset in the datasets{.r} package to illustrate. This dataset is a four-dimensional table which includes the Class (1^st^, 2^nd^, 3^rd^, Crew), Sex (Male, Female), Age (Child, Adult), and Survival (No, Yes) of the passengers of the 1912 maritime disaster. Use help("Titanic", "datasets"){.r} to find more information.

data(Titanic, package = "datasets")
str(Titanic)

For this illustration, we will remove the age dimension, transforming the four-dimensional table into a three-dimensional table. Let $X$ = sex, $Y$ = survival, and $Z$ = class. This dimensionality reduction is accomplished using the margin.table(){.r} function in the base package.

partial_tables <- margin.table(Titanic, c(2,4,1))
partial_tables

Each of the tables above is a partial table: survival by sex at a fixed level of class. The tables can be flattened for easier viewing using the ftable(){.r} function in the stats{.r} package (not shown).

The code below shows the marginal table of survival by sex, ignoring class. Again the dimensionality is reduced using the margin.table(){.r} function.

marginal_table <- margin.table(Titanic, c(2,4))
marginal_table

As an aside, we may get the table, row, or column proportions using the prop.table(){.r} function. Because the Titanic{.r} dataset is a multidimensional table, it must first be transformed into a two-dimensional table using margin.table(){.r} (as was performed above). Failure to do so will produce unexpected results.

# Table proportions
prop.table(marginal_table)

# Row proportions
prop.table(marginal_table, 1)

# Column proportions
prop.table(marginal_table, 2)

Conditional, Marginal, and Common Odds Ratios

In comparing variables $X$ and $Y$ at a fixed $j$ level of $Z$, we may use a conditional odds ratio, described by @Agresti [section 2.7.4], to represent the point estimate of association between the to variables. We will denote it as $\theta_{XY(j)}$. The marginal odds ratio would then refer to the odds ratio of $X$ and $Y$ generated by the marginal table. It follows that the marginal odds ratio would be denoted by $\theta_{XY}$.

An odds ratio estimate ($\hat\theta$) can be calculated from a table or matrix using the samplesizeCMH{.r} package using the odds.ratio(){.r} function. The odds.ratio(){.r} function can take either a table of frequencies or percents, as they are algebraicly equivalent.

Demonstration of Algebraic Equivalence of Frequencies in Percents in Odds Ratio Calculation

Using proportions, we see how the ratio of the row odds $o_1$ and $o_2$ are estimated.

$$ \hat{\theta}= \frac{\hat{o}1}{\hat{o}_2} = \frac{\hat{\pi}{11} / \hat{\pi}{12}}{\hat{\pi}{21} / \hat{\pi}{22}} = \frac{\hat{\pi}{11}\hat{\pi}{22}}{\hat{\pi}{12}\hat{\pi}_{21}}. $$

And since row odds estimates are related to cell counts through the following,

$$ \hat{o} = \frac{\hat{\pi}1}{1 - \hat{\pi}_1} = \frac{\hat{\pi}_1}{\hat{\pi}_2} = \frac{n_1 / n+}{n_2 / n_+} = \frac{n_1}{n_2}, $$

the odds estimate, defined as $\frac{\hat{\pi}_1}{\hat{\pi}_2}$, is equivalent to $\frac{n_1}{n_2}$. Therefore,

$$ \hat{\theta}= \frac{\hat{\pi}{11}\hat{\pi}{22}}{\hat{\pi}{12}\hat{\pi}{21}} = \frac{n_{11}n_{22}}{n_{12}n_{21}}. $$

Example

Let's first look at the marginal odds ratio of survival by sex using the Titanic data.

library(samplesizeCMH)

odds.ratio(marginal_table)

The conditional odds ratios can be calculated using the partial tables.

apply(partial_tables, 3, odds.ratio)

Obviously this is more informative than a simple marginal odds ratio. Based on what we see above, survival by sex appears to vary widely by class, where women in 1^st^ class survive at a much higher rate than men, whereas 3^rd^ class women had only slightly better chance of survival than their male counterparts.

We can produce a common (weighted) odds ratio using mantelhaen.test(){.r} from the stats{.r} package. Note that it differs slightly from the marginal odds ratio above since it takes into account the differential sizes of each partial table.

mantelhaen.test(partial_tables)

Conditional and Marginal Association/Independence

The term conditional association refers to the association of the $X$ and $Y$ variables conditional on the level of $Z$. Likewise, the marginal association refers to the overall association between $X$ and $Y$ while ignoring $Z$.

The finding of conditional association does not imply marginal association, nor vice-versa. The use of the CMH test to control for the stratifying variable in analysis serves to avoid the well-documented phenomenon of the Simpson's Paradox in which statistical significance may be found when considering the association between two variables, but where no such significance may be found after considering the stratification. Likewise, the reverse situation may arise where no association may be found between the binary variables, but may be observed when the third variable is introduced.

Refer to @Agresti [section 2.7.3] for more information on the content of this section.

Homogeneous Association/Independence

Homogeneous association is when all the odds ratios between binary variables $X$ and $Y$ are equal for all $j$ levels of variable $Z$, such that $$\theta_{XY(1)}=\theta_{XY(2)}=...=\theta_{XY(j)}.$$ [@Agresti section 2.7.6]

The Breslow-Day test can be used to check the null hypothesis that all odds ratios are equal. The Cochran-Mantel-Haenszel test can be thought of as a special case of the Breslow-Day test wherein the common odds ratio is assumed to be 1 (however, the calculations are not equivalent). Using the Titanic data, we can perform the Breslow-Day test using BreslowDayTest(){.r} from the DescTools{.r} package.

library(DescTools)

BreslowDayTest(x = partial_tables, OR = 1)

Note the near agreement with the output from mantelhaen.test(){.r} from the section above.

References



Try the samplesizeCMH package in your browser

Any scripts or data that you put into this service are public.

samplesizeCMH documentation built on May 2, 2019, 6:38 a.m.