IndependenceTest: General Independence Test In coin: Conditional Inference Procedures in a Permutation Test Framework

Description

Testing the independence of two sets of variables measured on arbitrary scales.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```## S3 method for class 'formula' independence_test(formula, data, subset = NULL, weights = NULL, ...) ## S3 method for class 'table' independence_test(object, ...) ## S3 method for class 'IndependenceProblem' independence_test(object, teststat = c("maximum", "quadratic", "scalar"), distribution = c("asymptotic", "approximate", "exact", "none"), alternative = c("two.sided", "less", "greater"), xtrafo = trafo, ytrafo = trafo, scores = NULL, check = NULL, ...) ```

Arguments

 `formula` a formula of the form `y1 + ... + yq ~ x1 + ... + xp | block` where `y1`, ..., `yq` and `x1`, ..., `xp` are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and `block` is an optional factor for stratification. `data` an optional data frame containing the variables in the model formula. `subset` an optional vector specifying a subset of observations to be used. Defaults to `NULL`. `weights` an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations. `object` an object inheriting from classes `"table"` or `"IndependenceProblem"`. `teststat` a character, the type of test statistic to be applied: either a maximum statistic (`"maximum"`, default), a quadratic form (`"quadratic"`) or a standardized scalar test statistic (`"scalar"`). `distribution` a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (`"asymptotic"`, default) or via Monte Carlo resampling (`"approximate"`). Alternatively, the functions `asymptotic` or `approximate` can be used. For univariate two-sample problems, `"exact"` or use of the function `exact` computes the exact distribution. Computation of the null distribution can be suppressed by specifying `"none"`. It is also possible to specify a function with one argument (an object inheriting from `"IndependenceTestStatistic"`) that returns an object of class `"NullDistribution"`. `alternative` a character, the alternative hypothesis: either `"two.sided"` (default), `"greater"` or `"less"`. `xtrafo` a function of transformations to be applied to the variables `x1`, ..., `xp` supplied in `formula`; see ‘Details’. Defaults to `trafo`. `ytrafo` a function of transformations to be applied to the variables `y1`, ..., `yq` supplied in `formula`; see ‘Details’. Defaults to `trafo`. `scores` a named list of scores to be attached to ordered factors; see ‘Details’. Defaults to `NULL`, implying equally spaced scores. `check` a function to be applied to objects of class `"IndependenceTest"` in order to check for specific properties of the data. Defaults to `NULL`. `...` further arguments to be passed to or from other methods (currently ignored).

Details

`independence_test` provides a general independence test for two sets of variables measured on arbitrary scales. This function is based on the general framework for conditional inference procedures proposed by Strasser and Weber (1999). The salient parts of the Strasser-Weber framework are elucidated by Hothorn et al. (2006) and a thorough description of the software implementation is given by Hothorn et al. (2008).

The null hypothesis of independence, or conditional independence given `block`, between `y1`, ..., `yq` and `x1`, ..., `xp` is tested.

A vector of case weights, e.g., observation counts, can be supplied through the `weights` argument and the type of test statistic is specified by the `teststat` argument. Influence and regression functions, i.e., transformations of `y1`, ..., `yq` and `x1`, ..., `xp`, are specified by the `ytrafo` and `xtrafo` arguments respectively; see `trafo` for the collection of transformation functions currently available. This allows for implementation of both novel and familiar test statistics, e.g., the Pearson χ^2 test, the generalized Cochran-Mantel-Haenszel test, the Spearman correlation test, the Fisher-Pitman permutation test, the Wilcoxon-Mann-Whitney test, the Kruskal-Wallis test and the family of weighted logrank tests for censored data. Furthermore, multivariate extensions such as the multivariate Kruskal-Wallis test (Puri and Sen, 1966, 1971) can be implemented without much effort (see ‘Examples’).

If, say, `y1` and/or `x1` are ordered factors, the default scores, `1:nlevels(y1)` and `1:nlevels(x1)` respectively, can be altered using the `scores` argument; this argument can also be used to coerce nominal factors to class `"ordered"`. For example, when `y1` is an ordered factor with four levels and `x1` is a nominal factor with three levels, `scores = list(y1 = c(1, 3:5), x1 = c(1:2, 4))` supplies the scores to be used. For ordered alternatives the scores must be monotonic, but non-montonic scores are also allowed for testing against, e.g., umbrella alternatives. The length of the score vector must be equal to the number of factor levels.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (`distribution = "asymptotic"`). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting `distribution` to `"approximate"` or `"exact"` respectively. See `asymptotic`, `approximate` and `exact` for details.

Value

An object inheriting from class `"IndependenceTest"`.

Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms can no longer be specified using `teststat = "maxtype"` and `teststat = "quadtype"` respectively (as was used in versions prior to 0.4-5).

References

Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician 60(3), 257–263. doi: 10.1198/000313006X118430

Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2008). Implementing a class of permutation tests: The coin package. Journal of Statistical Software 28(8), 1–23. doi: 10.18637/jss.v028.i08

Johnson, W. D., Mercante, D. E. and May, W. L. (1993). A computer package for the multivariate nonparametric rank test in completely randomized experimental designs. Computer Methods and Programs in Biomedicine 40(3), 217–225. doi: 10.1016/0169-2607(93)90059-T

Puri, M. L. and Sen, P. K. (1966). On a class of multivariate multisample rank order tests. Sankhya A 28(4), 353–376.

Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. New York: John Wiley & Sons.

Strasser, H. and Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 8(2), 220–250.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56``` ```## One-sided exact van der Waerden (normal scores) test... independence_test(asat ~ group, data = asat, ## exact null distribution distribution = "exact", ## one-sided test alternative = "greater", ## apply normal scores to asat\$asat ytrafo = function(data) trafo(data, numeric_trafo = normal_trafo), ## indicator matrix of 1st level of asat\$group xtrafo = function(data) trafo(data, factor_trafo = function(x) matrix(x == levels(x)[1], ncol = 1))) ## ...or more conveniently normal_test(asat ~ group, data = asat, ## exact null distribution distribution = "exact", ## one-sided test alternative = "greater") ## Receptor binding assay of benzodiazepines ## Johnson, Mercante and May (1993, Tab. 1) benzos <- data.frame( cerebellum = c( 3.41, 3.50, 2.85, 4.43, 4.04, 7.40, 5.63, 12.86, 6.03, 6.08, 5.75, 8.09, 7.56), brainstem = c( 3.46, 2.73, 2.22, 3.16, 2.59, 4.18, 3.10, 4.49, 6.78, 7.54, 5.29, 4.57, 5.39), cortex = c(10.52, 7.52, 4.57, 5.48, 7.16, 12.00, 9.36, 9.35, 11.54, 11.05, 9.92, 13.59, 13.21), hypothalamus = c(19.51, 10.00, 8.27, 10.26, 11.43, 19.13, 14.03, 15.59, 24.87, 14.16, 22.68, 19.93, 29.32), striatum = c( 6.98, 5.07, 3.57, 5.34, 4.57, 8.82, 5.76, 11.72, 6.98, 7.54, 7.66, 9.69, 8.09), hippocampus = c(20.31, 13.20, 8.58, 11.42, 13.79, 23.71, 18.35, 38.52, 21.56, 18.66, 19.24, 27.39, 26.55), treatment = factor(rep(c("Lorazepam", "Alprazolam", "Saline"), c(4, 4, 5))) ) ## Approximative (Monte Carlo) multivariate Kruskal-Wallis test ## Johnson, Mercante and May (1993, Tab. 2) independence_test(cerebellum + brainstem + cortex + hypothalamus + striatum + hippocampus ~ treatment, data = benzos, teststat = "quadratic", distribution = approximate(nresample = 10000), ytrafo = function(data) trafo(data, numeric_trafo = rank_trafo)) # Q = 16.129 ```

Example output

```Loading required package: survival

Exact General Independence Test

data:  asat by group (Compound, Control)
Z = 1.4269, p-value = 0.07809
alternative hypothesis: greater

Exact Two-Sample van der Waerden (Normal Quantile) Test

data:  asat by group (Compound, Control)
Z = 1.4269, p-value = 0.07809
alternative hypothesis: true mu is greater than 0

Approximative General Independence Test

data:  cerebellum, brainstem, cortex, hypothalamus, striatum, hippocampus by
treatment (Alprazolam, Lorazepam, Saline)
chi-squared = 16.129, p-value = 0.0767
```

coin documentation built on Oct. 8, 2021, 9:07 a.m.