# Generalized Maximally Selected Statistics

### Description

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

### Usage

1 2 3 4 5 6 7 8 | ```
## S3 method for class 'formula'
maxstat_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
maxstat_test(object, ...)
## S3 method for class 'IndependenceProblem'
maxstat_test(object, teststat = c("maximum", "quadratic"),
distribution = c("asymptotic", "approximate", "none"),
minprob = 0.1, maxprob = 1 - minprob, ...)
``` |

### Arguments

`formula` |
a formula of the form |

`data` |
an optional data frame containing the variables in the model formula. |

`subset` |
an optional vector specifying a subset of observations to be used. Defaults
to |

`weights` |
an optional formula of the form |

`object` |
an object inheriting from classes |

`teststat` |
a character, the type of test statistic to be applied: either a maximum
statistic ( |

`distribution` |
a character, the conditional null distribution of the test statistic can be
approximated by its asymptotic distribution ( |

`minprob` |
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only
greater than the |

`maxprob` |
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only
smaller than the |

`...` |
further arguments to be passed to |

### Details

`maxstat_test`

provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected *chi^2* statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given
`block`

, between `y1`

, ..., `yq`

and `x1`

, ...,
`xp`

is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate `x1`

, ...,
`xp`

, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.

If both response and covariate is univariable, say `y1`

and `x1`

,
this procedure is known as maximally selected *chi^2* statistics
(Miller and Siegmund, 1982) when `y1`

is a binary factor and `x1`

is
a numeric variable, and as maximally selected rank statistics when `y1`

is a rank transformed numeric variable and `x1`

is a numeric variable
(Lausen and Schumacher, 1992). Lausen *et al.* (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
numeric covariates `x1`

, ..., `xp`

.

If, say, `y1`

and/or `x1`

are ordered factors, the default scores,
`1:nlevels(y1)`

and `1:nlevels(x1)`

respectively, can be altered
using the `scores`

argument (see `independence_test`

); this
argument can also be used to coerce nominal factors to class `"ordered"`

.
If both, say, `y1`

and `x1`

are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the `alternative`

argument. The particular
extension to the case of a univariable binary factor response and a
univariable ordered covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain
*p*-values and an asymptotic approximation of the exact distribution is
used by default (`distribution = "asymptotic"`

). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
`distribution`

to `"approximate"`

. See `asymptotic`

and
`approximate`

for details.

### Value

An object inheriting from class `"IndependenceTest"`

.

### Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using `teststat = "maxtype"`

and
`teststat = "quadtype"`

respectively (as was used in versions prior to
0.4-5).

### References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected
*chi^2* statistics for *k x 2* tables.
*Biometrics* **55**(1), 317–320.

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally
selected rank statistics. *Computational Statistics & Data Analysis*
**43**(2), 121–137.

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected
statistics. *Biometrics* **64**(4), 1263–1269.

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Optimally
selected prognostic factors. *Biometrical Journal* **46**(3),
364–374.

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics.
*Biometrics* **48**(1), 73–85.

Miller, R. and Siegmund, D. (1982). Maximally selected chi square
statistics. *Biometrics* **38**(4), 1011–1016.

Müller, J. and Hothorn, T. (2004). Maximally selected
two-sample statistics as a new tool for the identification and assessment of
habitat factors with an application to breeding bird communities in oak
forests. *European Journal of Forest Research* **123**(3), 219–228.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ```
## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)
## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates$estimate
## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~ EF, data = hohnloser,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~ SPF, data = sphase,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)
## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
scores = list("Job.Satisfaction" = 1:4,
"Income" = 1:4))
``` |