contingency2xt: Kruskal-Wallis Test for the 2 x t Contingency Table

View source: R/contingency2xt.R

contingency2xtR Documentation

Kruskal-Wallis Test for the 2 x t Contingency Table

Description

This function uses the Kruskal-Wallis criterion to test the hypothesis of no association between the counts for two responses "A" and "B" across t categories.

Usage

contingency2xt(Avec, Bvec, 
	method = c("asymptotic", "simulated", "exact"), 
	dist = FALSE, tab0 = TRUE, Nsim = 1e+06)

Arguments

Avec

vector of length t giving the counts A_1,\ldots, A_t for response "A" according to t categories. m = A_1 + \ldots + A_t.

Bvec

vector of length t giving the counts B_1,\ldots, B_t for response "B" according to t categories. n = B_1 + \ldots + B_t = N-m.

method

= c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic chi-square approximation with t-1 degrees of freedom to approximate the P-value. This calculation is always done.

"simulated" uses Nsim simulated counts for Avec and Bvec with the observed marginal totals, m, n, d = Avec+Bvec, to estimate the P-value.

"exact" enumerates all counts for Avec and Bvec with the observed marginal totals to get an exact P-value. It is used only when Nsim is at least as large as the number choose(m+t-1,t-1) of full enumerations. Otherwise, method reverts to "simulated" using the given Nsim.

dist

FALSE (default) or TRUE. If dist = TRUE, the distribution of the simulated or fully enumerated Kruskal-Wallis test statistics is returned as null.dist, if dist = FALSE the value of null.dist is NULL. The coice dist = TRUE also limits Nsim <- min(Nsim,1e8).

tab0

TRUE (default) or FALSE. If tab0 = TRUE, the null distribution is returned in 2 column matrix form when method = "simulated". When tab0 = FALSE the simulated null distribution is returned as a vector of all simulated values of the test statistic.

Nsim

=10000 (default), number of simulated Avec splits to use. It is only used when method = "simulated", or when method = "exact" reverts to method = "simulated", as previously explained.

Details

For this data scenario the Kruskal-Wallis criterion is

K.star = \frac{N(N-1)}{mn}(\sum\frac{A_i^2}{d_i}-\frac{m^2}{N})

with d_i=A_i+B_i, treating "A" responses as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.

For small sample sizes exact null distribution calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011), which allows the enumeration of all possible splits of m into counts A_1,\ldots, A_t such that m = A_1 + \ldots + A_t, followed by the calculation of the statistic K.star for each such split. Simulation of A_1,\ldots, A_t uses the probability model (5.35) in Lehmann (2006) to successively generate hypergeometric counts A_1,\ldots, A_t. Both these processes, enumeration and simulation, are done in C.

Value

A list of class kSamples with components

test.name

"2 x t Contingency Table"

t

number of classification categories

KW.cont

2 (3) vector giving the observed KW statistic, its asymptotic P-value (and simulated or exact P-value)

null.dist

simulated or enumerated null distribution of the test statistic. It is given as an M by 2 matrix, where the first column (named KW) gives the M unique ordered values of the Kruskal-Wallis statistic and the second column (named prob) gives the corresponding (simulated or exact) probabilities.

This format of null.dist is returned when method = "exact" and dist = TRUE or when method = "simulated" and dist = TRUE and tab0 = TRUE are specified.

For method = "simulated", dist = TRUE, and tab0 = FALSE the null distribution null.dist is returned as the vector of all simulated test statistic values. This is used in contingency2xt.comb in the simulation mode.

null.dist = NULL is returned when dist = FALSE or when method = "asymptotic".

method

the method used.

Nsim

the number of simulations.

warning

method = "exact" should only be used with caution. Computation time is proportional to the number of enumerations. In most cases dist = TRUE should not be used, i.e., when the returned distribution objects become too large for R's work space.

References

Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley

Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540

Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583–621.

Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer, New York.

Examples

contingency2xt(c(25,15,20),c(16,6,18),method="exact",dist=FALSE,
	tab0=TRUE,Nsim=1e3)

kSamples documentation built on Oct. 8, 2023, 1:07 a.m.