# maxstat.test: Maximally Selected Rank and Statistics In maxstat: Maximally Selected Rank Statistics

## Description

Performs a test of independence of a response and one or more covariables using maximally selected rank statistics.

## Usage

 ```1 2 3 4 5 6``` ```## S3 method for class 'data.frame' maxstat.test(formula, data, subset, na.action, ...) maxstat(y, x=NULL, weights = NULL, smethod=c("Wilcoxon", "Median", "NormalQuantil","LogRank", "Data"), pmethod=c("none", "Lau92", "Lau94", "exactGauss", "HL", "condMC", "min"), iscores=(pmethod=="HL"), minprop = 0.1, maxprop=0.9, alpha = NULL, keepxy=TRUE, ...) ```

## Arguments

 `y` numeric vector of data values, dependent variable. `x` numeric vector of data values, independent variable. `weights` an optional numeric vector of non-negative weights, summing to the number of observations. `smethod` kind of statistic to be computed, i.e. defines the scores to be used for computing the statistic. `pmethod` kind of p-value approximation to be used. `iscores` logical: should the scores be mapped into integers `1:length(x)`? This is TRUE by default for `pmethod=="HL"` and FALSE otherwise. `minprop` at least `minprop`*100% of the observations in the first group. `maxprop` not more than `minprop`*100% of the observations in the first group. `alpha` significance niveau, the appropriate quantile is computed if `alpha` is specified. Used for plotting within `plot.maxtest`. `keepxy` logical: return `y` and `x` as elements of the `maxtest` object. `formula` a formula describing the model to be tested of the form `lhs ~ rhs` where `lhs` is the response variable. For survival problems, i.e. using the log-rank statistic, the formula is of the form `Surv(time, event) ~ rhs`, see above. `data` an data frame containing the variables in the model formula. `data` is required. `subset` an optional vector specifying a subset of observations to be used. `na.action` a function which indicates what should happen when the data contain `NA`s. Defaults to `getOption("na.action")`. `...` additional parameters to be passed to `pmvnorm` or `B`, an integer defining the number of Monte-Carlo replications.

## Details

The assessment of the predictive power of a variable `x` for a dependent variable `y` can be determined by a maximally selected rank statistic.

`smethod` determines the kind of statistic to be used. `Wilcoxon` and `Median` denote maximally selected Wilcoxon and Median statistics. `NormalQuantile` and `LogRank` denote v.d. Waerden and log-rank scores.

`pmethod` specifies which kind of approximation of the p-value should be used. `Lau92` is the limiting distribution by a Brownian bridge (see Lausen and Schumacher, 1992, and `pLausen92`), `Lau94` the approximation based on an improved Bonferroni inequality (see Lausen, Sauerbrei and Schumacher, 1994, and `pLausen94`).

`exactGauss` returns the exact p-value for a maximally selected Gauss statistic, see Hothorn and Lausen (2003).

`HL` is a small sample approximation based on the Streitberg-R\"ohmel algorithm (see `pperm`) introduced by Hothorn and Lausen (2003). This requires integer valued scores. For v. d. Waerden and Log-rank scores we try to find integer valued scores having the same shape. This results in slightly different scores (and a different test), the procedure is described in Hothorn (2001) and Hothorn and Lausen (2003).

All the approximations are known to be conservative, so `min` gives the minimum p-value of all procedures.

`condMC` simulates the distribution via conditional Monte-Carlo.

For survival problems, i.e. using a maximally selected log-rank statistic, the interface is similar to `survfit`. The depended variable is a survival object `Surv(time, event)`. The argument `event` may be a numeric vector of `0` (alive) and `1` (dead) or a vector of logicals with `TRUE` indicating death.

If more than one covariable is specified in the right hand side of `formula` (or if `x` is a matrix or data frame), the variable with smallest p-value is selected and the p-value for the global test problem of independence of `y` and every variable on the right hand side is returned (see Lausen et al., 2002).

## Value

An object of class `maxtest` or `mmaxtest` (if more than one covariable was specified) containing the following components is returned:

 `statistic ` the value of the test statistic. `p.value ` the p-value for the test. `smethod` the type of test applied. `pmethod` the type of p-value approximation applied. `estimate` the estimated cutpoint (of `x`) which separates `y` best. For numeric data, the groups are defined by `x` less or equal to `estimate` and `x` greater `estimate`. `maxstats` a list of `maxtest` objects, one for each covariable. `whichmin` an integer specifying the element of `maxstats` with smallest p-value. `p.value` the p-value of the global test. `univp.values` the p-values for each of the variables under test. `cm` the correlation matrix the p-value is based on.

`plot.maxtest` and `print.maxtest` can be used for plotting and printing. If `keepxy = TRUE`, there are elements `y` and `x` giving the response and independent variable.

## References

Hothorn, T. and Lausen, B. (2003). On the Exact Distribution of Maximally Selected Rank Statistics. Computational Statistics & Data Analysis, 43, 121–137.

Lausen, B. and Schumacher, M. (1992). Maximally Selected Rank Statistics. Biometrics, 48, 73–85

Lausen, B., Sauerbrei, W. and Schumacher, M. (1994). Classification and Regression Trees (CART) used for the exploration of prognostic factors measured on different scales. in: P. Dirschedl and R. Ostermann (Eds), Computational Statistics, Heidelberg, Physica-Verlag, 483–496

Hothorn, T. (2001). On Exact Rank Tests in R. R News, 1, 11–12

Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004). Assessment of Optimally Selected Prognostic Factors. Biometrical Journal, 46(3), 364–374.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```set.seed(29) x <- sort(runif(20)) y <- c(rnorm(10), rnorm(10, 2)) mydata <- data.frame(cbind(x,y)) mod <- maxstat.test(y ~ x, data=mydata, smethod="Wilcoxon", pmethod="HL", minprop=0.25, maxprop=0.75, alpha=0.05) print(mod) plot(mod) # adjusted for more than one prognostic factor. library("survival") mstat <- maxstat.test(Surv(time, cens) ~ IPI + MGE, data=DLBCL, smethod="LogRank", pmethod="exactGauss", abseps=0.01) plot(mstat) ### sphase set.seed(29) data("sphase", package = "TH.data") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94", iscores=TRUE) maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="HL") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="condMC", B = 9999) plot(maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank")) ```

maxstat documentation built on May 2, 2019, 2:44 a.m.