ContingencyTests: Tests of Independence in Two- or Three-Way Contingency Tables

Description Usage Arguments Details Value Note References Examples

Description

Testing the independence of two nominal or ordered factors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## S3 method for class 'formula'
chisq_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
chisq_test(object, ...)
## S3 method for class 'IndependenceProblem'
chisq_test(object, ...)

## S3 method for class 'formula'
cmh_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
cmh_test(object, ...)
## S3 method for class 'IndependenceProblem'
cmh_test(object, ...)

## S3 method for class 'formula'
lbl_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
lbl_test(object, ...)
## S3 method for class 'IndependenceProblem'
lbl_test(object, ...)

Arguments

formula

a formula of the form y ~ x | block where y and x are factors and block is an optional factor for stratification.

data

an optional data frame containing the variables in the model formula.

subset

an optional vector specifying a subset of observations to be used. Defaults to NULL.

weights

an optional formula of the form ~ w defining integer valued case weights for each observation. Defaults to NULL, implying equal weight for all observations.

object

an object inheriting from classes "table" or "IndependenceProblem".

...

further arguments to be passed to independence_test.

Details

chisq_test, cmh_test and lbl_test provide the Pearson chi-squared test, the generalized Cochran-Mantel-Haenszel test and the linear-by-linear association test. A general description of these methods is given by Agresti (2002).

The null hypothesis of independence, or conditional independence given block, between y and x is tested.

If y and/or x are ordered factors, the default scores, 1:nlevels(y) and 1:nlevels(x) respectively, can be altered using the scores argument (see independence_test); this argument can also be used to coerce nominal factors to class "ordered". (lbl_test coerces to class "ordered" under any circumstances.) If both y and x are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. For the Pearson chi-squared test, this extension was given by Yates (1948) who also discussed the situation when either the response or the covariate is an ordered factor; see also Cochran (1954) and Armitage (1955) for the particular case when y is a binary factor and x is ordered. The Mantel-Haenszel statistic (Mantel and Haenszel, 1959) was similarly extended by Mantel (1963) and Landis, Heyman and Koch (1978).

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact" respectively. See asymptotic, approximate and exact for details.

Value

An object inheriting from class "IndependenceTest".

Note

The exact versions of the Pearson chi-squared test and the generalized Cochran-Mantel-Haenszel test do not necessarily result in the same p-value as Fisher's exact test (Davis, 1986).

References

Agresti, A. (2002). Categorical Data Analysis, Second Edition. Hoboken, New Jersey: John Wiley & Sons.

Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386. doi: 10.2307/3001775

Cochran, W.G. (1954). Some methods for strengthening the common χ^2 tests. Biometrics 10(4), 417–451. doi: 10.2307/3001616

Davis, L. J. (1986). Exact tests for 2 x 2 contingency tables. The American Statistician 40(2), 139–141. doi: 10.1080/00031305.1986.10475377

Landis, J. R., Heyman, E. R. and Koch, G. G. (1978). Average partial association in three-way contingency tables: a review and discussion of alternative tests. International Statistical Review 46(3), 237–254. doi: 10.2307/1402373

Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22(4), 719–748. doi: 10.1093/jnci/22.4.719

Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association 58(303), 690–700. doi: 10.1080/01621459.1963.10500879

Yates, F. (1948). The analysis of contingency tables with groupings based on quantitative characters. Biometrika 35(1/2), 176–181. doi: 10.1093/biomet/35.1-2.176

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
## Example data
## Davis (1986, p. 140)
davis <- matrix(
    c(3,  6,
      2, 19),
    nrow = 2, byrow = TRUE
)
davis <- as.table(davis)

## Asymptotic Pearson chi-squared test
chisq_test(davis)
chisq.test(davis, correct = FALSE) # same as above

## Approximative (Monte Carlo) Pearson chi-squared test
ct <- chisq_test(davis,
                 distribution = approximate(nresample = 10000))
pvalue(ct)             # standard p-value
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value

## Exact Pearson chi-squared test (Davis, 1986)
## Note: disagrees with Fisher's exact test
ct <- chisq_test(davis,
                 distribution = "exact")
pvalue(ct)             # standard p-value
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
fisher.test(davis)


## Laryngeal cancer data
## Agresti (2002, p. 107, Tab. 3.13)
cancer <- matrix(
    c(21, 2,
      15, 3),
    nrow = 2, byrow = TRUE,
    dimnames = list(
        "Treatment" = c("Surgery", "Radiation"),
           "Cancer" = c("Controlled", "Not Controlled")
    )
)
cancer <- as.table(cancer)

## Exact Pearson chi-squared test (Agresti, 2002, p. 108, Tab. 3.14)
## Note: agrees with Fishers's exact test
(ct <- chisq_test(cancer,
                  distribution = "exact"))
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
fisher.test(cancer)


## Homework conditions and teacher's rating
## Yates (1948, Tab. 1)
yates <- matrix(
    c(141, 67, 114, 79, 39,
      131, 66, 143, 72, 35,
       36, 14,  38, 28, 16),
    byrow = TRUE, ncol = 5,
    dimnames = list(
           "Rating" = c("A", "B", "C"),
        "Condition" = c("A", "B", "C", "D", "E")
    )
)
yates <- as.table(yates)

## Asymptotic Pearson chi-squared test (Yates, 1948, p. 176)
chisq_test(yates)

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, pp. 180-181)
## Note: 'Rating' and 'Condition' as ordinal
(ct <- chisq_test(yates,
                  alternative = "less",
                  scores = list("Rating" = c(-1, 0, 1),
                                "Condition" = c(2, 1, 0, -1, -2))))
statistic(ct)^2 # chi^2 = 2.332

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, p. 181)
## Note: 'Rating' as ordinal
chisq_test(yates,
           scores = list("Rating" = c(-1, 0, 1))) # Q = 3.825


## Change in clinical condition and degree of infiltration
## Cochran (1954, Tab. 6)
cochran <- matrix(
    c(11,  7,
      27, 15,
      42, 16,
      53, 13,
      11,  1),
    byrow = TRUE, ncol = 2,
    dimnames = list(
              "Change" = c("Marked", "Moderate", "Slight",
                           "Stationary", "Worse"),
        "Infiltration" = c("0-7", "8-15")
    )
)
cochran <- as.table(cochran)

## Asymptotic Pearson chi-squared test (Cochran, 1954, p. 435)
chisq_test(cochran) # X^2 = 6.88

## Asymptotic Cochran-Armitage test (Cochran, 1954, p. 436)
## Note: 'Change' as ordinal
(ct <- chisq_test(cochran,
                  scores = list("Change" = c(3, 2, 1, 0, -1))))
statistic(ct)^2 # X^2 = 6.66


## Change in size of ulcer crater for two treatment groups
## Armitage (1955, Tab. 2)
armitage <- matrix(
    c( 6, 4, 10, 12,
      11, 8,  8,  5),
    byrow = TRUE, ncol = 4,
    dimnames = list(
        "Treatment" = c("A", "B"),
           "Crater" = c("Larger", "< 2/3 healed",
                        ">= 2/3 healed", "Healed")
    )
)
armitage <- as.table(armitage)

## Approximative (Monte Carlo) Pearson chi-squared test (Armitage, 1955, p. 379)
chisq_test(armitage,
           distribution = approximate(nresample = 10000)) # chi^2 = 5.91

## Approximative (Monte Carlo) Cochran-Armitage test (Armitage, 1955, p. 379)
(ct <- chisq_test(armitage,
                  distribution = approximate(nresample = 10000),
                  scores = list("Crater" = c(-1.5, -0.5, 0.5, 1.5))))
statistic(ct)^2 # chi_0^2 = 5.26


## Relationship between job satisfaction and income stratified by gender
## Agresti (2002, p. 288, Tab. 7.8)

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
(ct <- cmh_test(jobsatisfaction)) # CMH = 10.2001

## The standardized linear statistic
statistic(ct, type = "standardized")

## The standardized linear statistic for each block
statistic(ct, type = "standardized", partial = TRUE)

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
## Note: 'Job.Satisfaction' as ordinal
cmh_test(jobsatisfaction,
         scores = list("Job.Satisfaction" = c(1, 3, 4, 5))) # L^2 = 9.0342

## Asymptotic linear-by-linear association test (Agresti, p. 297)
## Note: 'Job.Satisfaction' and 'Income' as ordinal
(lt <- lbl_test(jobsatisfaction,
                scores = list("Job.Satisfaction" = c(1, 3, 4, 5),
                              "Income" = c(3, 10, 20, 35))))
statistic(lt)^2 # M^2 = 6.1563

## The standardized linear statistic
statistic(lt, type = "standardized")

## The standardized linear statistic for each block
statistic(lt, type = "standardized", partial = TRUE)

Example output

Loading required package: survival

	Asymptotic Pearson Chi-Squared Test

data:  Var2 by Var1 (A, B)
chi-squared = 2.5714, df = 1, p-value = 0.1088


	Pearson's Chi-squared test

data:  davis
X-squared = 2.5714, df = 1, p-value = 0.1088

Warning message:
In chisq.test(davis, correct = FALSE) :
  Chi-squared approximation may be incorrect
[1] 0.2884
99 percent confidence interval:
 0.2767843 0.3002182 

[1] 0.15385
99 percent confidence interval:
 0.1446722 0.1632571 

   p_0    p_1 
0.0193 0.2884 
[1] 0.0193
[1] 0.2860301
[1] 0.1527409
       p_0        p_1 
0.01945181 0.28603006 
[1] 0.01945181

	Fisher's Exact Test for Count Data

data:  davis
p-value = 0.1432
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.410317 65.723900
sample estimates:
odds ratio 
  4.462735 


	Exact Pearson Chi-Squared Test

data:  Cancer by Treatment (Surgery, Radiation)
chi-squared = 0.59915, p-value = 0.6384

[1] 0.5006832
      p_0       p_1 
0.3629407 0.6384258 
[1] 0.01143318

	Fisher's Exact Test for Count Data

data:  cancer
p-value = 0.6384
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.2089115 27.5538747
sample estimates:
odds ratio 
  2.061731 


	Asymptotic Pearson Chi-Squared Test

data:  Condition by Rating (A, B, C)
chi-squared = 9.0928, df = 8, p-value = 0.3345


	Asymptotic Linear-by-Linear Association Test

data:  Condition (ordered) by Rating (A < B < C)
Z = -1.5269, p-value = 0.06339
alternative hypothesis: less

[1] 2.33154

	Asymptotic Generalized Pearson Chi-Squared Test

data:  Condition by Rating (A < B < C)
chi-squared = 3.8242, df = 4, p-value = 0.4303


	Asymptotic Pearson Chi-Squared Test

data:  Infiltration by
	 Change (Marked, Moderate, Slight, Stationary, Worse)
chi-squared = 6.8807, df = 4, p-value = 0.1423


	Asymptotic Linear-by-Linear Association Test

data:  Infiltration by
	 Change (Marked < Moderate < Slight < Stationary < Worse)
Z = -2.5818, p-value = 0.009829
alternative hypothesis: two.sided

[1] 6.665691

	Approximative Pearson Chi-Squared Test

data:  Crater by Treatment (A, B)
chi-squared = 5.9085, p-value = 0.1187


	Approximative Linear-by-Linear Association Test

data:  Crater (ordered) by Treatment (A, B)
Z = 2.2932, p-value = 0.0313
alternative hypothesis: two.sided

[1] 5.258804

	Asymptotic Generalized Cochran-Mantel-Haenszel Test

data:  Job.Satisfaction by
	 Income (<5000, 5000-15000, 15000-25000, >25000) 
	 stratified by Gender
chi-squared = 10.2, df = 9, p-value = 0.3345

            Very Dissatisfied A Little Satisfied Moderately Satisfied
<5000               1.3112789         0.69201053           -0.2478705
5000-15000          0.6481783         0.83462550            0.5175755
15000-25000        -1.0958361        -1.50130926            0.2361231
>25000             -1.0377629        -0.08983052           -0.5946119
            Very Satisfied
<5000           -0.9293458
5000-15000      -1.6257547
15000-25000      1.4614123
>25000           1.2031648
            Very Dissatisfied A Little Satisfied Moderately Satisfied
<5000               1.3112789         0.69201053           -0.2478705
5000-15000          0.6481783         0.83462550            0.5175755
15000-25000        -1.0958361        -1.50130926            0.2361231
>25000             -1.0377629        -0.08983052           -0.5946119
            Very Satisfied
<5000           -0.9293458
5000-15000      -1.6257547
15000-25000      1.4614123
>25000           1.2031648

	Asymptotic Generalized Cochran-Mantel-Haenszel Test

data:  Job.Satisfaction (ordered) by
	 Income (<5000, 5000-15000, 15000-25000, >25000) 
	 stratified by Gender
chi-squared = 9.0342, df = 3, p-value = 0.02884


	Asymptotic Linear-by-Linear Association Test

data:  Job.Satisfaction (ordered) by
	 Income (<5000 < 5000-15000 < 15000-25000 < >25000) 
	 stratified by Gender
Z = 2.4812, p-value = 0.01309
alternative hypothesis: two.sided

[1] 6.156301
        
 2.48119
        
 2.48119

coin documentation built on Oct. 8, 2021, 9:07 a.m.