Transformations: Functions for Data Transformation

TransformationsR Documentation

Functions for Data Transformation

Description

Transformations for factors and numeric variables.

Usage

id_trafo(x)
rank_trafo(x, ties.method = c("mid-ranks", "random"))
normal_trafo(x, ties.method = c("mid-ranks", "average-scores"))
median_trafo(x, mid.score = c("0", "0.5", "1"))
savage_trafo(x, ties.method = c("mid-ranks", "average-scores"))
consal_trafo(x, ties.method = c("mid-ranks", "average-scores"), a = 5)
koziol_trafo(x, ties.method = c("mid-ranks", "average-scores"), j = 1)
klotz_trafo(x, ties.method = c("mid-ranks", "average-scores"))
mood_trafo(x, ties.method = c("mid-ranks", "average-scores"))
ansari_trafo(x, ties.method = c("mid-ranks", "average-scores"))
fligner_trafo(x, ties.method = c("mid-ranks", "average-scores"))
logrank_trafo(x, ties.method = c("mid-ranks", "Hothorn-Lausen",
                                 "average-scores"),
              weight = logrank_weight, ...)
logrank_weight(time, n.risk, n.event,
               type = c("logrank", "Gehan-Breslow", "Tarone-Ware",
                        "Peto-Peto", "Prentice", "Prentice-Marek",
                        "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington",
                        "Gaugler-Kim-Liao", "Self"),
               rho = NULL, gamma = NULL)
f_trafo(x)
of_trafo(x, scores = NULL)
zheng_trafo(x, increment = 0.1)
maxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
fmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
ofmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
trafo(data, numeric_trafo = id_trafo, factor_trafo = f_trafo,
      ordered_trafo = of_trafo, surv_trafo = logrank_trafo,
      var_trafo = NULL, block = NULL)
mcp_trafo(...)

Arguments

x

an object of class "numeric", "factor", "ordered" or "Surv".

ties.method

a character, the method used to handle ties. The score generating function either uses the mid-ranks ("mid-ranks", default) or, in the case of rank_trafo(), randomly broken ties ("random"). Alternatively, the average of the scores resulting from applying the score generating function to randomly broken ties are used ("average-scores"). See logrank_test() for a detailed description of the methods used in logrank_trafo().

mid.score

a character, the score assigned to observations exactly equal to the median: either 0 ("0", default), 0.5 ("0.5") or 1 ("1"); see median_test().

a

a numeric vector, the values taken as the constant a in the Conover-Salsburg scores. Defaults to 5.

j

a numeric, the value taken as the constant j in the Koziol-Nemec scores. Defaults to 1.

weight

a function where the first three arguments must correspond to time, n.risk, and n.event given below. Defaults to logrank_weight.

time

a numeric vector, the ordered distinct time points.

n.risk

a numeric vector, the number of subjects at risk at each time point specified in time.

n.event

a numeric vector, the number of events at each time point specified in time.

type

a character, one of "logrank" (default), "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "Prentice", "Prentice-Marek", "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test().

rho

a numeric vector, the \rho constant when type is "Tarone-Ware", "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test(). Defaults to NULL, implying 0.5 for type = "Tarone-Ware" and 0 otherwise.

gamma

a numeric vector, the \gamma constant when type is "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test(). Defaults to NULL, implying 0.

scores

a numeric vector or list, the scores corresponding to each level of an ordered factor. Defaults to NULL, implying 1:nlevels(x).

increment

a numeric, the score increment between the order-restricted sets of scores. A fraction greater than 0, but smaller than or equal to 1. Defaults to 0.1.

minprob

a numeric, a fraction between 0 and 0.5; see maxstat_test(). Defaults to 0.1.

maxprob

a numeric, a fraction between 0.5 and 1; see maxstat_test(). Defaults to 1 - minprob.

data

an object of class "data.frame".

numeric_trafo

a function to be applied to elements of class "numeric" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to id_trafo.

factor_trafo

a function to be applied to elements of class "factor" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to f_trafo.

ordered_trafo

a function to be applied to elements of class "ordered" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to of_trafo.

surv_trafo

a function to be applied to elements of class "Surv" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to logrank_trafo.

var_trafo

an optional named list of functions to be applied to the corresponding variables in data. Defaults to NULL.

block

an optional factor whose levels are interpreted as blocks. trafo is applied to each level of block separately. Defaults to NULL.

...

logrank_trafo(): further arguments to be passed to weight. mcp_trafo(): factor name and contrast matrix (as matrix or character) in a ‘⁠tag = value⁠’ format for multiple comparisons based on a single unordered factor; see mcp() in package multcomp.

Details

The utility functions documented here are used to define specialized test procedures.

id_trafo() is the identity transformation.

rank_trafo(), normal_trafo(), median_trafo(), savage_trafo(), consal_trafo() and koziol_trafo() compute rank (Wilcoxon) scores, normal (van der Waerden) scores, median (Mood-Brown) scores, Savage scores, Conover-Salsburg scores (see neuropathy) and Koziol-Nemec scores, respectively, for location problems.

klotz_trafo(), mood_trafo(), ansari_trafo() and fligner_trafo() compute Klotz scores, Mood scores, Ansari-Bradley scores and Fligner-Killeen scores, respectively, for scale problems.

logrank_trafo() computes weighted logrank scores for right-censored data, allowing for a user-defined weight function through the weight argument (see GTSG).

f_trafo() computes dummy matrices for factors and of_trafo() assigns scores to ordered factors. For ordered factors with two levels, the scores are normalized to the [0, 1] range. zheng_trafo() computes a finite collection of order-restricted scores for ordered factors (see jobsatisfaction, malformations and vision).

maxstat_trafo(), fmaxstat_trafo() and ofmaxstat_trafo() compute scores for cutpoint problems (see maxstat_test()).

trafo() applies its arguments to the elements of data according to the classes of the elements. A trafo() function with modified default arguments is usually supplied to independence_test() via the xtrafo or ytrafo arguments. Fine tuning, i.e., different transformations for different variables, is possible by supplying a named list of functions to the var_trafo argument.

mcp_trafo() computes contrast matrices for factors.

Value

A numeric vector or matrix with nrow(x) rows and an arbitrary number of columns. For trafo(), a named matrix with nrow(data) rows and an arbitrary number of columns.

Note

Starting with coin version 1.1-0, all transformation functions are now passing through missing values (i.e., NAs). Furthermore, median_trafo() and logrank_trafo() are now increasing functions (in conformity with most other transformations in this package).

Examples

## Dummy matrix, two-sample problem (only one column)
f_trafo(gl(2, 3))

## Dummy matrix, K-sample problem (K columns)
x <- gl(3, 2)
f_trafo(x)

## Score matrix
ox <- as.ordered(x)
of_trafo(ox)
of_trafo(ox, scores = c(1, 3:4))
of_trafo(ox, scores = list(s1 = 1:3, s2 = c(1, 3:4)))
zheng_trafo(ox, increment = 1/3)

## Normal scores
y <- runif(6)
normal_trafo(y)

## All together now
trafo(data.frame(x = x, ox = ox, y = y), numeric_trafo = normal_trafo)

## The same, but allows for fine-tuning
trafo(data.frame(x = x, ox = ox, y = y), var_trafo = list(y = normal_trafo))

## Transformations for maximally selected statistics
maxstat_trafo(y)
fmaxstat_trafo(x)
ofmaxstat_trafo(ox)

## Apply transformation blockwise (as in the Friedman test)
trafo(data.frame(y = 1:20), numeric_trafo = rank_trafo, block = gl(4, 5))

## Multiple comparisons
dta <- data.frame(x)
mcp_trafo(x = "Tukey")(dta)

## The same, but useful when specific contrasts are desired
K <- rbind("2 - 1" = c(-1,  1, 0),
           "3 - 1" = c(-1,  0, 1),
           "3 - 2" = c( 0, -1, 1))
mcp_trafo(x = K)(dta)

coin documentation built on Sept. 27, 2023, 5:09 p.m.