# MaximallySelectedStatisticsTests: Generalized Maximally Selected Statistics In coin: Conditional Inference Procedures in a Permutation Test Framework

## Description

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

## Usage

 ```1 2 3 4 5 6 7 8``` ```## S3 method for class 'formula' maxstat_test(formula, data, subset = NULL, weights = NULL, ...) ## S3 method for class 'table' maxstat_test(object, ...) ## S3 method for class 'IndependenceProblem' maxstat_test(object, teststat = c("maximum", "quadratic"), distribution = c("asymptotic", "approximate", "none"), minprob = 0.1, maxprob = 1 - minprob, ...) ```

## Arguments

 `formula` a formula of the form `y1 + ... + yq ~ x1 + ... + xp | block` where `y1`, ..., `yq` and `x1`, ..., `xp` are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and `block` is an optional factor for stratification. `data` an optional data frame containing the variables in the model formula. `subset` an optional vector specifying a subset of observations to be used. Defaults to `NULL`. `weights` an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations. `object` an object inheriting from classes `"table"` or `"IndependenceProblem"`. `teststat` a character, the type of test statistic to be applied: either a maximum statistic (`"maximum"`, default) or a quadratic form (`"quadratic"`). `distribution` a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (`"asymptotic"`, default) or via Monte Carlo resampling (`"approximate"`). Alternatively, the functions `asymptotic` or `approximate` can be used. Computation of the null distribution can be suppressed by specifying `"none"`. `minprob` a numeric, a fraction between 0 and 0.5 specifying that cutpoints only greater than the `minprob` * 100 % quantile of `x1`, ..., `xp` are considered. Defaults to `0.1`. `maxprob` a numeric, a fraction between 0.5 and 1 specifying that cutpoints only smaller than the `maxprob` * 100 % quantile of `x1`, ..., `xp` are considered. Defaults to `1 - minprob`. `...` further arguments to be passed to `independence_test`.

## Details

`maxstat_test` provides generalized maximally selected statistics. The family of maximally selected statistics encompasses a large collection of procedures used for the estimation of simple cutpoint models including, but not limited to, maximally selected chi^2 statistics, maximally selected Cochran-Armitage statistics, maximally selected rank statistics and maximally selected statistics for multiple covariates. A general description of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given `block`, between `y1`, ..., `yq` and `x1`, ..., `xp` is tested against cutpoint alternatives. All possible partitions into two groups are evaluated for each unordered covariate `x1`, ..., `xp`, whereas only order-preserving binary partitions are evaluated for ordered or numeric covariates. The cutpoint is then a set of levels defining one of the two groups.

If both response and covariate is univariable, say `y1` and `x1`, this procedure is known as maximally selected chi^2 statistics (Miller and Siegmund, 1982) when `y1` is a binary factor and `x1` is a numeric variable, and as maximally selected rank statistics when `y1` is a rank transformed numeric variable and `x1` is a numeric variable (Lausen and Schumacher, 1992). Lausen et al. (2004) introduced maximally selected statistics for a univariable numeric response and multiple numeric covariates `x1`, ..., `xp`.

If, say, `y1` and/or `x1` are ordered factors, the default scores, `1:nlevels(y1)` and `1:nlevels(x1)` respectively, can be altered using the `scores` argument (see `independence_test`); this argument can also be used to coerce nominal factors to class `"ordered"`. If both, say, `y1` and `x1` are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the `alternative` argument. The particular extension to the case of a univariable binary factor response and a univariable ordered covariate was given by Betensky and Rabinowitz (1999) and is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (`distribution = "asymptotic"`). Alternatively, the distribution can be approximated via Monte Carlo resampling by setting `distribution` to `"approximate"`. See `asymptotic` and `approximate` for details.

## Value

An object inheriting from class `"IndependenceTest"`.

## Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms can no longer be specified using `teststat = "maxtype"` and `teststat = "quadtype"` respectively (as was used in versions prior to 0.4-5).

## References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320.

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137.

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269.

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Optimally selected prognostic factors. Biometrical Journal 46(3), 364–374.

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85.

Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016.

Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```## Tree pipit data (Mueller and Hothorn, 2004) ## Asymptotic maximally selected statistics maxstat_test(counts ~ coverstorey, data = treepipit) ## Asymptotic maximally selected statistics ## Note: all covariates simultaneously mt <- maxstat_test(counts ~ ., data = treepipit) mt@estimates\$estimate ## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2) ## Asymptotic maximally selected statistics maxstat_test(Surv(time, event) ~ EF, data = hohnloser, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3) ## Asymptotic maximally selected statistics data("sphase", package = "TH.data") maxstat_test(Surv(RFS, event) ~ SPF, data = sphase, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8) ## Asymptotic maximally selected statistics maxstat_test(jobsatisfaction) ## Asymptotic maximally selected statistics ## Note: 'Job.Satisfaction' and 'Income' as ordinal maxstat_test(jobsatisfaction, scores = list("Job.Satisfaction" = 1:4, "Income" = 1:4)) ```

coin documentation built on July 18, 2017, 1:02 a.m.