Generalized Maximally Selected Statistics
Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.
1 2 3 4 5 6 7 8
## S3 method for class 'formula' maxstat_test(formula, data, subset = NULL, weights = NULL, ...) ## S3 method for class 'table' maxstat_test(object, ...) ## S3 method for class 'IndependenceProblem' maxstat_test(object, teststat = c("maximum", "quadratic"), distribution = c("asymptotic", "approximate", "none"), minprob = 0.1, maxprob = 1 - minprob, ...)
a formula of the form
an optional data frame containing the variables in the model formula.
an optional vector specifying a subset of observations to be used. Defaults
an optional formula of the form
an object inheriting from classes
a character, the type of test statistic to be applied: either a maximum
a character, the conditional null distribution of the test statistic can be
approximated by its asymptotic distribution (
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only
greater than the
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only
smaller than the
further arguments to be passed to
maxstat_test provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected chi^2 statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).
The null hypothesis of independence, or conditional independence given
xp is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate
xp, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.
If both response and covariate is univariable, say
this procedure is known as maximally selected chi^2 statistics
(Miller and Siegmund, 1982) when
y1 is a binary factor and
a numeric variable, and as maximally selected rank statistics when
is a rank transformed numeric variable and
x1 is a numeric variable
(Lausen and Schumacher, 1992). Lausen et al. (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
x1 are ordered factors, the default scores,
1:nlevels(x1) respectively, can be altered
scores argument (see
argument can also be used to coerce nominal factors to class
If both, say,
x1 are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the
alternative argument. The particular
extension to the case of a univariable binary factor response and a
univariable ordered covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.
The conditional null distribution of the test statistic is used to obtain
p-values and an asymptotic approximation of the exact distribution is
used by default (
distribution = "asymptotic"). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
approximate for details.
An object inheriting from class
Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using
teststat = "maxtype" and
teststat = "quadtype" respectively (as was used in versions prior to
Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320.
Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137.
Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269.
Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Optimally selected prognostic factors. Biometrical Journal 46(3), 364–374.
Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85.
Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016.
Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## Tree pipit data (Mueller and Hothorn, 2004) ## Asymptotic maximally selected statistics maxstat_test(counts ~ coverstorey, data = treepipit) ## Asymptotic maximally selected statistics ## Note: all covariates simultaneously mt <- maxstat_test(counts ~ ., data = treepipit) mt@estimates$estimate ## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2) ## Asymptotic maximally selected statistics maxstat_test(Surv(time, event) ~ EF, data = hohnloser, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3) ## Asymptotic maximally selected statistics data("sphase", package = "TH.data") maxstat_test(Surv(RFS, event) ~ SPF, data = sphase, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8) ## Asymptotic maximally selected statistics maxstat_test(jobsatisfaction) ## Asymptotic maximally selected statistics ## Note: 'Job.Satisfaction' and 'Income' as ordinal maxstat_test(jobsatisfaction, scores = list("Job.Satisfaction" = 1:4, "Income" = 1:4))
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.