Generalized Maximally Selected Statistics

Share:

Description

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

Usage

1
2
3
4
5
6
7
8
## S3 method for class 'formula'
maxstat_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
maxstat_test(object, ...)
## S3 method for class 'IndependenceProblem'
maxstat_test(object, teststat = c("maximum", "quadratic"),
             distribution = c("asymptotic", "approximate", "none"),
             minprob = 0.1, maxprob = 1 - minprob, ...)

Arguments

formula

a formula of the form y1 + ... + yq ~ x1 + ... + xp | block where y1, ..., yq and x1, ..., xp are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and block is an optional factor for stratification.

data

an optional data frame containing the variables in the model formula.

subset

an optional vector specifying a subset of observations to be used. Defaults to NULL.

weights

an optional formula of the form ~ w defining integer valued case weights for each observation. Defaults to NULL, implying equal weight for all observations.

object

an object inheriting from classes "table" or "IndependenceProblem".

teststat

a character, the type of test statistic to be applied: either a maximum statistic ("maximum", default) or a quadratic form ("quadratic").

distribution

a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution ("asymptotic", default) or via Monte Carlo resampling ("approximate"). Alternatively, the functions asymptotic or approximate can be used. Computation of the null distribution can be suppressed by specifying "none".

minprob

a numeric, a fraction between 0 and 0.5 specifying that cutpoints only greater than the minprob * 100 % quantile of x1, ..., xp are considered. Defaults to 0.1.

maxprob

a numeric, a fraction between 0.5 and 1 specifying that cutpoints only smaller than the maxprob * 100 % quantile of x1, ..., xp are considered. Defaults to 1 - minprob.

...

further arguments to be passed to independence_test.

Details

maxstat_test provides generalized maximally selected statistics. The family of maximally selected statistics encompasses a large collection of procedures used for the estimation of simple cutpoint models including, but not limited to, maximally selected chi^2 statistics, maximally selected Cochran-Armitage statistics, maximally selected rank statistics and maximally selected statistics for multiple covariates. A general description of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given block, between y1, ..., yq and x1, ..., xp is tested against cutpoint alternatives. All possible partitions into two groups are evaluated for each unordered covariate x1, ..., xp, whereas only order-preserving binary partitions are evaluated for ordered or numeric covariates. The cutpoint is then a set of levels defining one of the two groups.

If both response and covariate is univariable, say y1 and x1, this procedure is known as maximally selected chi^2 statistics (Miller and Siegmund, 1982) when y1 is a binary factor and x1 is a numeric variable, and as maximally selected rank statistics when y1 is a rank transformed numeric variable and x1 is a numeric variable (Lausen and Schumacher, 1992). Lausen et al. (2004) introduced maximally selected statistics for a univariable numeric response and multiple numeric covariates x1, ..., xp.

If, say, y1 and/or x1 are ordered factors, the default scores, 1:nlevels(y1) and 1:nlevels(x1) respectively, can be altered using the scores argument (see independence_test); this argument can also be used to coerce nominal factors to class "ordered". If both, say, y1 and x1 are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. The particular extension to the case of a univariable binary factor response and a univariable ordered covariate was given by Betensky and Rabinowitz (1999) and is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling by setting distribution to "approximate". See asymptotic and approximate for details.

Value

An object inheriting from class "IndependenceTest".

Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms can no longer be specified using teststat = "maxtype" and teststat = "quadtype" respectively (as was used in versions prior to 0.4-5).

References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320.

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137.

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269.

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Optimally selected prognostic factors. Biometrical Journal 46(3), 364–374.

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85.

Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016.

Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)

## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates$estimate


## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~  EF, data = hohnloser,
             ytrafo = function(data)
                 trafo(data, surv_trafo = function(y)
                     logrank_trafo(y, ties.method = "Hothorn-Lausen")))


## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~  SPF, data = sphase,
             ytrafo = function(data)
                 trafo(data, surv_trafo = function(y)
                     logrank_trafo(y, ties.method = "Hothorn-Lausen")))


## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)

## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
             scores = list("Job.Satisfaction" = 1:4,
                           "Income" = 1:4))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.