SurvivalTests: Two- and K-Sample Tests for Censored Data
In coin: Conditional Inference Procedures in a Permutation Test Framework

SurvivalTests

R Documentation

Two- and `K`-Sample Tests for Censored Data

Description

Testing the equality of the survival distributions in two or more independent groups.

Usage

## S3 method for class 'formula'
logrank_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
logrank_test(object, ties.method = c("mid-ranks", "Hothorn-Lausen",
                                     "average-scores"),
             type = c("logrank", "Gehan-Breslow", "Tarone-Ware",
                      "Peto-Peto", "Prentice", "Prentice-Marek",
                      "Andersen-Borgan-Gill-Keiding",
                      "Fleming-Harrington", "Gaugler-Kim-Liao", "Self"),
             rho = NULL, gamma = NULL, ...)

Arguments

`formula`	a formula of the form `y ~ x \| block` where `y` is a survival object (see `Surv` in package survival), `x` is a factor and `block` is an optional factor for stratification.
`data`	an optional data frame containing the variables in the model formula.
`subset`	an optional vector specifying a subset of observations to be used. Defaults to `NULL`.
`weights`	an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations.
`object`	an object inheriting from class `"IndependenceProblem"`.
`ties.method`	a character, the method used to handle ties: the score generating function either uses mid-ranks (`"mid-ranks"`, default), the Hothorn-Lausen method (`"Hothorn-Lausen"`) or averages the scores of randomly broken ties (`"average-scores"`); see ‘Details’.
`type`	a character, the type of test: either `"logrank"` (default), `"Gehan-Breslow"`, `"Tarone-Ware"`, `"Peto-Peto"`, `"Prentice"`, `"Prentice-Marek"`, `"Andersen-Borgan-Gill-Keiding"`, `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see ‘Details’.
`rho`	a numeric, the `\rho` constant when `type` is `"Tarone-Ware"`, `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see ‘Details’. Defaults to `NULL`, implying `0.5` for `type = "Tarone-Ware"` and `0` otherwise.
`gamma`	a numeric, the `\gamma` constant when `type` is `"Fleming-Harrington"`, `"Gaugler-Kim-Liao"` or `"Self"`; see ‘Details’. Defaults to `NULL`, implying `0`.
`...`	further arguments to be passed to `independence_test()`.

Details

logrank_test() provides the weighted logrank test reformulated as a linear rank test. The family of weighted logrank tests encompasses a large collection of tests commonly used in the analysis of survival data including, but not limited to, the standard (unweighted) logrank test, the Gehan-Breslow test, the Tarone-Ware class of tests, the Peto-Peto test, the Prentice test, the Prentice-Marek test, the Andersen-Borgan-Gill-Keiding test, the Fleming-Harrington class of tests, the Gaugler-Kim-Liao class of tests and the Self class of tests. A general description of these methods is given by \bibcitet|coin::Klein_Moeschberger_2003|Ch. 7. See \bibcitetcoin::leton_2001 for the linear rank test formulation.

The null hypothesis of equality, or conditional equality given block, of the survival distribution of y in the groups defined by x is tested. In the two-sample case, the two-sided null hypothesis is H_0\!: \theta = 1, where \theta = \lambda_2 / \lambda_1 and \lambda_s is the hazard rate in the sth sample. In case alternative = "less", the null hypothesis is H_0\!: \theta \ge 1, i.e., the survival is lower in sample 1 than in sample 2. When alternative = "greater", the null hypothesis is H_0\!: \theta \le 1, i.e., the survival is higher in sample 1 than in sample 2.

If x is an ordered factor, the default scores, 1:nlevels(x), can be altered using the scores argument (see independence_test()); this argument can also be used to coerce nominal factors to class "ordered". In this case, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. This type of extension of the standard logrank test was given by \bibcitetcoin::tarone_1975 and later generalized to general weights by \bibcitetcoin::tarone_1977.

Let (t_i, \delta_i), i = 1, 2, \ldots, n, represent a right-censored random sample of size n, where t_i is the observed survival time and \delta_i is the status indicator (\delta_i is 0 for right-censored observations and 1 otherwise). To allow for ties in the data, let t_{(1)} < t_{(2)} < \cdots < t_{(m)} represent the m, m \le n, ordered distinct event times. At time t_{(k)}, k = 1, 2, \ldots, m, the number of events and the number of subjects at risk are given by d_k = \sum_{i = 1}^n I\!\left(t_i = t_{(k)}\,|\, \delta_i = 1\right) and n_k = n - r_k, respectively, where r_k depends on the ties handling method.

Three different methods of handling ties are available using ties.method: mid-ranks ("mid-ranks", default), the Hothorn-Lausen method ("Hothorn-Lausen") and average-scores ("average-scores"). The first and last method are discussed and contrasted by \bibcitetcoin::callaert_2003, whereas the second method is defined in \bibcitetcoin::Hothorn:2003:CSDA. The mid-ranks method leads to

r_k = \sum_{i = 1}^n I\!\left(t_i < t_{(k)}\right)

whereas the Hothorn-Lausen method uses

r_k = \sum_{i = 1}^n I\!\left(t_i \le t_{(k)}\right) - 1.

The scores assigned to right-censored and uncensored observations at the kth event time are given by

C_k = \sum_{j = 1}^k w_j \frac{d_j}{n_j} \quad \mathrm{and} \quad c_k = C_k - w_k,

respectively, where w is the logrank weight. For the average-scores method, used by, e.g., the software package StatXact, the d_k events observed at the kth event time are arbitrarily ordered by assigning them distinct values t_{(k_l)}, l = 1, 2, \ldots, d_k, infinitesimally to the left of t_{(k)}. Then scores C_{k_l} and c_{k_l} are computed as indicated above, effectively assuming that no event times are tied. The scores C_k and c_k are assigned the average of the scores C_{k_l} and c_{k_l}, respectively. It then follows that the score for the ith subject is

a_i = \left\{ \begin{array}{ll} C_{k'} & \mathrm{if}~\delta_i = 0 \\ c_{k'} & \mathrm{otherwise} \end{array} \right.

where k' = \max \{k: t_i \ge t_{(k)}\}.

The type argument allows for a choice between some of the most well-known members of the family of weighted logrank tests, each corresponding to a particular weight function. The standard logrank test ("logrank", default) was suggested by \bibcitetcoin::Mantel:1966, \bibcitetcoin::peto_1972 and \bibcitetcoin::cox_1972 and has w_k = 1. The Gehan-Breslow test ("Gehan-Breslow") proposed by \bibcitetcoin::gehan_1965 and later extended to K samples by \bibcitetcoin::breslow_1970 is a generalization of the Wilcoxon rank-sum test, where w_k = n_k. The Tarone-Ware class of tests ("Tarone-Ware") discussed by \bibcitetcoin::tarone_1977 has w_k = n_k^\rho, where \rho is a constant; \rho = 0.5 (default) was suggested by \bibcitetcoin::tarone_1977, but note that \rho = 0 and \rho = 1 lead to the standard logrank test and Gehan-Breslow test, respectively. The Peto-Peto test ("Peto-Peto") suggested by \bibcitetcoin::peto_1972 is another generalization of the Wilcoxon rank-sum test, where

w_k = \hat{S}_k = \prod_{j = 0}^{k - 1} \frac{n_j - d_j}{n_j}

is the left-continuous Kaplan-Meier estimator of the survival function, n_0 \equiv n and d_0 \equiv 0. The Prentice test ("Prentice") is also a generalization of the Wilcoxon rank-sum test proposed by \bibcitetcoin::prentice_1978, where

w_k = \prod_{j = 1}^k \frac{n_j}{n_j + d_j}.

The Prentice-Marek test ("Prentice-Marek") is yet another generalization of the Wilcoxon rank-sum test discussed by \bibcitetcoin::prentice_1979, with

w_k = \tilde{S}_k = \prod_{j = 1}^k \frac{n_j + 1 - d_j}{n_j + 1}.

The Andersen-Borgan-Gill-Keiding test ("Andersen-Borgan-Gill-Keiding") suggested by \bibcitetcoin::andersen_1982 is a modified version of the Prentice-Marek test using

w_k = \frac{n_k}{n_k + 1} \prod_{j = 0}^{k - 1} \frac{n_j + 1 - d_j}{n_j + 1},

where, again, n_0 \equiv n and d_0 \equiv 0. The Fleming-Harrington class of tests ("Fleming-Harrington") proposed by \bibcitetcoin::Fleming+Harrington:1991 uses w_k = \hat{S}_k^\rho (1 - \hat{S}_k)^\gamma, where \rho and \gamma are constants; \rho = 0 and \gamma = 0 lead to the standard logrank test, while \rho = 1 and \gamma = 0 result in the Peto-Peto test. The Gaugler-Kim-Liao class of tests ("Gaugler-Kim-Liao") discussed by \bibcitetcoin::gaugler_2007 is a modified version of the Fleming-Harrington class of tests, replacing \hat{S}_k with \tilde{S}_k so that w_k = \tilde{S}_k^\rho (1 - \tilde{S}_k)^\gamma, where \rho and \gamma are constants; \rho = 0 and \gamma = 0 lead to the standard logrank test, whereas \rho = 1 and \gamma = 0 result in the Prentice-Marek test. The Self class of tests ("Self") suggested by \bibcitetcoin::self_1991 has w_k = v_k^\rho (1 - v_k)^\gamma, where

v_k = \frac{1}{2} \frac{t_{(k-1)} + t_{(k)}}{t_{(m)}}, \quad t_{(0)} \equiv 0

is the standardized mid-point between the (k - 1)th and the kth event time. (This is a slight generalization of Self's original proposal in order to allow for non-integer follow-up times.) Again, \rho and \gamma are constants and \rho = 0 and \gamma = 0 lead to the standard logrank test.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact", respectively. See asymptotic(), approximate() and exact() for details.

Value

An object inheriting from class "IndependenceTest".

Note

\bibcitet

coin::peto_1972 proposed the test statistic implemented in logrank_test() and named it the logrank test. However, the Mantel-Cox test \bibcitepcoin::Mantel:1966,coin::cox_1972, as implemented in survdiff() (in package survival), is also known as the logrank test. These tests are similar, but differ in the choice of probability model: the (Peto-Peto) logrank test uses the permutational variance, whereas the Mantel-Cox test is based on the hypergeometric variance.

Combining independence_test() or symmetry_test() with logrank_trafo() offers more flexibility than logrank_test() and allows for, among other things, maximum-type versatile test procedures \bibcitepe.g.|coin::lee_1996|see ‘Examples’) and user-supplied logrank weights (see GTSG for tests against Weibull-type or crossing-curve alternatives.

Starting with version 1.1-0, logrank_test() replaced surv_test() which was made defunct in version 1.2-0. Furthermore, logrank_trafo() is now an increasing function for all choices of ties.method, implying that the test statistic has the same sign irrespective of the ties handling method. Consequently, the sign of the test statistic will now be the opposite of what it was in earlier versions unless ties.method = "average-scores". (In versions prior to 1.1-0, logrank_trafo() was a decreasing function when ties.method was other than "average-scores".)

Starting with version 1.2-0, mid-ranks and the Hothorn-Lausen method can no longer be specified with ties.method = "logrank" and ties-method = "HL", respectively.

References

\bibshow

Examples

## Example data (Callaert, 2003, Tab. 1)
callaert <- data.frame(
    time = c(1, 1, 5, 6, 6, 6, 6, 2, 2, 2, 3, 4, 4, 5, 5),
    group = factor(rep(0:1, c(7, 8)))
)

## Logrank scores using mid-ranks (Callaert, 2003, Tab. 2)
with(callaert,
     logrank_trafo(Surv(time)))

## Asymptotic Mantel-Cox test (p = 0.0523)
survdiff(Surv(time) ~ group, data = callaert)

## Exact logrank test using mid-ranks (p = 0.0505)
logrank_test(Surv(time) ~ group, data = callaert, distribution = "exact")

## Exact logrank test using average-scores (p = 0.0468)
logrank_test(Surv(time) ~ group, data = callaert, distribution = "exact",
             ties.method = "average-scores")


## Lung cancer data (StatXact 9 manual, p. 213, Tab. 7.19)
lungcancer <- data.frame(
    time = c(257, 476, 355, 1779, 355,
             191, 563, 242, 285, 16, 16, 16, 257, 16),
    event = c(0, 0, 1, 1, 0,
              1, 1, 1, 1, 1, 1, 1, 1, 1),
    group = factor(rep(1:2, c(5, 9)),
                   labels = c("newdrug", "control"))
)

## Logrank scores using average-scores (StatXact 9 manual, p. 214)
with(lungcancer,
     logrank_trafo(Surv(time, event), ties.method = "average-scores"))

## Exact logrank test using average-scores (StatXact 9 manual, p. 215)
logrank_test(Surv(time, event) ~ group, data = lungcancer,
             distribution = "exact", ties.method = "average-scores")

## Exact Prentice test using average-scores (StatXact 9 manual, p. 222)
logrank_test(Surv(time, event) ~ group, data = lungcancer,
             distribution = "exact", ties.method = "average-scores",
             type = "Prentice")


## Approximative (Monte Carlo) versatile test (Lee, 1996)
rho.gamma <- expand.grid(rho = seq(0, 2, 1), gamma = seq(0, 2, 1))
lee_trafo <- function(y)
    logrank_trafo(y, ties.method = "average-scores",
                  type = "Fleming-Harrington",
                  rho = rho.gamma["rho"], gamma = rho.gamma["gamma"])

it <- independence_test(Surv(time, event) ~ group, data = lungcancer,
                        distribution = approximate(nresample = 10000),
                        ytrafo = function(data)
                            trafo(data, surv_trafo = lee_trafo))
pvalue(it, method = "step-down")

coin documentation built on June 30, 2026, 9:06 a.m.