Flexible Modeling of Survival Data

Description

Fit a flexible parametric regression model to possibly right-censored, left-truncated data.

Usage

1
flexPM(formula, data, weights, df = 3, degree = 3, knots, maxit, tol = 1e-6)

Arguments

formula

an object of class “formula”: a symbolic description of the model to be fitted. The response must be a survival object as returned by the Surv function. See ‘Details’.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If missing, the variables are taken from environment(formula).

weights

optional weights to be used during the fitting process.

df

the degrees of freedom of the B-spline that describes u(T) (see ‘Details’).

degree

the degree of the polynomial of the B-spline that describes u(T).

knots

the internal knots of the B-spline that describes u(T).

maxit

maximum number of iterations of the Newton-Raphson algorithm. If missing, a default value based on the number of free parameters is used.

tol

tolerance for the Newton-Raphson algorithm.

Details

This function fits a flexible parametric model to possibly right-censored, left-truncated outcomes, usually survival data. Right censoring occurs when instead of some outcome T, one can only observe Y = min(T,C) and d = I(T ≤ C), the indicator of failure. Left truncation occurs when Y is only observed subject to Y > Z. Typically, Z is the time at enrollement of subjects with a disease. Those who died before enrollment (Y < Z) cannot be observed, thus generating selection bias.

The formula must be of the form

  • Surv(y,d) ~ x, with censored data;

  • Surv(z,y,d) ~ x, with censored, truncated data;

  • Surv(y) ~ x, is also allowed and denotes non-censored, non-truncated data.

In the above, x is a set of predictors, y is the response variable, z truncation times (z < y), and d the indicator of failure (1 = event, 0 = censored).

In flexPM, model fitting is implemented as follows. First, the response variable is pre-trasformed using a smoothed version of y = qlogis(rank(y)/(n + 1)). Second, parameter estimation is carried out on the transformed variable. Maximum likelihood estimators are computed via Newton-Raphson algorithm, using the following flexible distribution:

F(t|x) = 1/(1 + exp{-[u(t) - m(x)]/s(x)}).

In the above, m(x) and log s(x) are modeled as specified by formula, while u(.) is a B-spline function built via spline.des (see bs). You can choose the degrees of freedom df and the degree of the spline basis. The model parameters are (a) the coefficients describing the effect of covariates x on m(x) and log s(x), and (b) the coefficients of the B-spline basis that defines the unknown transformation u(.), on which suitable constraints are imposed to ensure monotonicity.

Value

An object of class “flexPM”, which is a list with the following items:

converged

logical value indicating the convergence status of the algorithm.

n.it

the number of iterations.

n

the number of observations.

n.free.par

the number of free parameters in the model.

loglik

the values of the log-likelihood at convergence.

AIC, BIC

the Akaike and Bayesian information criterion.

mf

the used model frame.

call

the matched call.

The model parameters are returned as attributes of mf and are not user-level objects. The model is intended to be used for prediction and not for inference. The hessian matrix is not returned. The number of free parameters is df + 2*ncol(x) - 1, and not df + 2*ncol(x), because the scale of u(.) and that of F(t|x) are exchangeable and thus one coefficient of u(.) is constrained to be 1.

The accessor functions summary, nobs, logLik, AIC, and BIC can be used to extract information from the model. The fit is only intended for prediction: use predict.flexPM.

Note

The model is fitted assuming that an unknown transformation of the response variable follows a Logistic distribution. The choice of the “kernel” distribution is only due to computational convenience and does not reflect any prior belief. Provided that u(.) is sufficiently flexible, asymmetric or multi-modal distributions can be fitted. Pre-transforming the response variable (added in flexPM 2.0) removes the outliers, generates a more symmetric distribution, and frequently permits achieving a good fit using fewer knots for u(.).

This flexible parametric approach generally outoperforms fully nonparametric estimators like local Kaplan-Meier, at a cost of a relatively small bias.

Author(s)

Paolo Frumento paolo.frumento@ki.se

See Also

predict.flexPM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Simulated data from a normal distribution 

n <- 1000
x1 <- rnorm(n)
x2 <- runif(n)


# non-censored, non-truncated data

t <- rnorm(n, 2 + 3*x1, 1 + x2) # time variable
m1 <- flexPM(Surv(t) ~ x1 + x2)


# right-censored data

c <- rnorm(n,3,3) # censoring variable
y <- pmin(t,c)    # observed outcome
d <- (t <= c)     # 1 = observed, 0 = censored
m2 <- flexPM(Surv(y,d) ~ x1 + x2)


# right-censored, left-truncated data

z <- rnorm(n,-3,3) # truncating variable
w <- which(y > z)  # only observe if y > z
y <- y[w]; d <- d[w]; z <- z[w]; x1 <- x1[w]; x2 <- x2[w]
m3 <- flexPM(Surv(z,y,d) ~ x1 + x2)

################################################################

# m1, m2, m3 are not intended to be interpreted.
# Use predict() to obtain predictions.

# Note that the following are identical:
# summary(flexPM(Surv(y) ~ x1 + x2))
# summary(flexPM(Surv(y, rep(1,length(y))) ~ x1 + x2))
# summary(flexPM(Surv(rep(-Inf,length(y)), y, rep(1,length(y))) ~ x1 + x2))

################################################################

# Use the logLik, AIC and BIC for model selection 
# (choice of df, inclusion/exclusion of covariates)

models <- list(
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2, df = 3),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 1, degree = 1),
  flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 3),
  flexPM(Surv(z,y,d) ~ x1 * x2, df = 5)
)

my_final_model <- models[[which.min(sapply(models, function(x) x$AIC))]]
summary(my_final_model)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.