Transformations: Functions for Data Transformation
In coin: Conditional Inference Procedures in a Permutation Test Framework

Transformations

R Documentation

Functions for Data Transformation

Description

Transformations for factors and numeric variables.

Usage

id_trafo(x)
rank_trafo(x, ties.method = c("mid-ranks", "random"))
normal_trafo(x, ties.method = c("mid-ranks", "average-scores"))
median_trafo(x, mid.score = c("0", "0.5", "1"))
savage_trafo(x, ties.method = c("mid-ranks", "average-scores"))
consal_trafo(x, ties.method = c("mid-ranks", "average-scores"), a = 5)
koziol_trafo(x, ties.method = c("mid-ranks", "average-scores"), j = 1)
klotz_trafo(x, ties.method = c("mid-ranks", "average-scores"))
mood_trafo(x, ties.method = c("mid-ranks", "average-scores"))
ansari_trafo(x, ties.method = c("mid-ranks", "average-scores"))
fligner_trafo(x, ties.method = c("mid-ranks", "average-scores"))
logrank_trafo(x, ties.method = c("mid-ranks", "Hothorn-Lausen",
                                 "average-scores"),
              weight = logrank_weight, ...)
logrank_weight(time, n.risk, n.event,
               type = c("logrank", "Gehan-Breslow", "Tarone-Ware",
                        "Peto-Peto", "Prentice", "Prentice-Marek",
                        "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington",
                        "Gaugler-Kim-Liao", "Self"),
               rho = NULL, gamma = NULL)
f_trafo(x)
of_trafo(x, scores = NULL)
zheng_trafo(x, increment = 0.1)
maxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
fmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
ofmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
trafo(data, numeric_trafo = id_trafo, factor_trafo = f_trafo,
      ordered_trafo = of_trafo, surv_trafo = logrank_trafo,
      var_trafo = NULL, block = NULL)
mcp_trafo(...)

Arguments

`x`	an object of class `"numeric"`, `"factor"`, `"ordered"` or `"Surv"`.
`ties.method`	a character, the method used to handle ties. The score generating function either uses the mid-ranks (`"mid-ranks"`, default) or, in the case of `rank_trafo()`, randomly broken ties (`"random"`). Alternatively, the average of the scores resulting from applying the score generating function to randomly broken ties are used (`"average-scores"`). See `logrank_test()` for a detailed description of the methods used in `logrank_trafo()`.
`mid.score`	a character, the score assigned to observations exactly equal to the median: either 0 (`"0"`, default), 0.5 (`"0.5"`) or 1 (`"1"`); see `median_test()`.
`a`	a numeric vector, the values taken as the constant `a` in the Conover-Salsburg scores. Defaults to `5`.
`j`	a numeric, the value taken as the constant `j` in the Koziol-Nemec scores. Defaults to `1`.
`weight`	a function where the first three arguments must correspond to `time`, `n.risk`, and `n.event` given below. Defaults to `logrank_weight`.
`time`	a numeric vector, the ordered distinct time points.
`n.risk`	a numeric vector, the number of subjects at risk at each time point specified in `time`.
`n.event`	a numeric vector, the number of events at each time point specified in `time`.
`type`	a character, one of `"logrank"` (default), `"Gehan-Breslow"`, `"Tarone-Ware"`, `"Peto-Peto"`, `"Prentice"`, `"Prentice-Marek"`, `"Andersen-Borgan-Gill-Keiding"`, `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see `logrank_test()`.
`rho`	a numeric vector, the `\rho` constant when `type` is `"Tarone-Ware"`, `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see `logrank_test()`. Defaults to `NULL`, implying `0.5` for `type = "Tarone-Ware"` and `0` otherwise.
`gamma`	a numeric vector, the `\gamma` constant when `type` is `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see `logrank_test()`. Defaults to `NULL`, implying `0`.
`scores`	a numeric vector or list, the scores corresponding to each level of an ordered factor. Defaults to `NULL`, implying `1:nlevels(x)`.
`increment`	a numeric, the score increment between the order-restricted sets of scores. A fraction greater than 0, but smaller than or equal to 1. Defaults to `0.1`.
`minprob`	a numeric, a fraction between 0 and 0.5; see `maxstat_test()`. Defaults to `0.1`.
`maxprob`	a numeric, a fraction between 0.5 and 1; see `maxstat_test()`. Defaults to `1 - minprob`.
`data`	an object of class `"data.frame"`.
`numeric_trafo`	a function to be applied to elements of class `"numeric"` in `data`, returning a matrix with `nrow(data)` rows and an arbitrary number of columns. Defaults to `id_trafo`.
`factor_trafo`	a function to be applied to elements of class `"factor"` in `data`, returning a matrix with `nrow(data)` rows and an arbitrary number of columns. Defaults to `f_trafo`.
`ordered_trafo`	a function to be applied to elements of class `"ordered"` in `data`, returning a matrix with `nrow(data)` rows and an arbitrary number of columns. Defaults to `of_trafo`.
`surv_trafo`	a function to be applied to elements of class `"Surv"` in `data`, returning a matrix with `nrow(data)` rows and an arbitrary number of columns. Defaults to `logrank_trafo`.
`var_trafo`	an optional named list of functions to be applied to the corresponding variables in `data`. Defaults to `NULL`.
`block`	an optional factor whose levels are interpreted as blocks. `trafo` is applied to each level of `block` separately. Defaults to `NULL`.
`...`	`logrank_trafo()`: further arguments to be passed to `weight`. `mcp_trafo()`: factor name and contrast matrix (as matrix or character) in a ‘⁠tag = value⁠’ format for multiple comparisons based on a single unordered factor; see `mcp()` in package multcomp.

Details

The utility functions documented here are used to define specialized test procedures.

id_trafo() is the identity transformation.

rank_trafo(), normal_trafo(), median_trafo(), savage_trafo(), consal_trafo() and koziol_trafo() compute rank (Wilcoxon) scores, normal (van der Waerden) scores, median (Mood-Brown) scores, Savage scores, Conover-Salsburg scores (see neuropathy) and Koziol-Nemec scores, respectively, for location problems.

klotz_trafo(), mood_trafo(), ansari_trafo() and fligner_trafo() compute Klotz scores, Mood scores, Ansari-Bradley scores and Fligner-Killeen scores, respectively, for scale problems.

logrank_trafo() computes weighted logrank scores for right-censored data, allowing for a user-defined weight function through the weight argument (see GTSG).

f_trafo() computes dummy matrices for factors and of_trafo() assigns scores to ordered factors. For ordered factors with two levels, the scores are normalized to the [0, 1] range. zheng_trafo() computes a finite collection of order-restricted scores for ordered factors (see jobsatisfaction, malformations and vision).

maxstat_trafo(), fmaxstat_trafo() and ofmaxstat_trafo() compute scores for cutpoint problems (see maxstat_test()).

trafo() applies its arguments to the elements of data according to the classes of the elements. A trafo() function with modified default arguments is usually supplied to independence_test() via the xtrafo or ytrafo arguments. Fine tuning, i.e., different transformations for different variables, is possible by supplying a named list of functions to the var_trafo argument.

mcp_trafo() computes contrast matrices for factors.

Value

A numeric vector or matrix with nrow(x) rows and an arbitrary number of columns. For trafo(), a named matrix with nrow(data) rows and an arbitrary number of columns.

Note

Starting with coin version 1.1-0, all transformation functions are now passing through missing values (i.e., NAs). Furthermore, median_trafo() and logrank_trafo() are now increasing functions (in conformity with most other transformations in this package).

Examples

## Dummy matrix, two-sample problem (only one column)
f_trafo(gl(2, 3))

## Dummy matrix, K-sample problem (K columns)
x <- gl(3, 2)
f_trafo(x)

## Score matrix
ox <- as.ordered(x)
of_trafo(ox)
of_trafo(ox, scores = c(1, 3:4))
of_trafo(ox, scores = list(s1 = 1:3, s2 = c(1, 3:4)))
zheng_trafo(ox, increment = 1/3)

## Normal scores
y <- runif(6)
normal_trafo(y)

## All together now
trafo(data.frame(x = x, ox = ox, y = y), numeric_trafo = normal_trafo)

## The same, but allows for fine-tuning
trafo(data.frame(x = x, ox = ox, y = y), var_trafo = list(y = normal_trafo))

## Transformations for maximally selected statistics
maxstat_trafo(y)
fmaxstat_trafo(x)
ofmaxstat_trafo(ox)

## Apply transformation blockwise (as in the Friedman test)
trafo(data.frame(y = 1:20), numeric_trafo = rank_trafo, block = gl(4, 5))

## Multiple comparisons
dta <- data.frame(x)
mcp_trafo(x = "Tukey")(dta)

## The same, but useful when specific contrasts are desired
K <- rbind("2 - 1" = c(-1,  1, 0),
           "3 - 1" = c(-1,  0, 1),
           "3 - 2" = c( 0, -1, 1))
mcp_trafo(x = K)(dta)

coin documentation built on Sept. 27, 2023, 5:09 p.m.