Description Usage Arguments Details Value Note Author(s) See Also Examples
Fit a flexible parametric regression model to possibly right-censored, left-truncated data.
1 |
formula |
an object of class “ |
data |
an optional data frame, list or environment (or object coercible by |
weights |
optional weights to be used during the fitting process. |
df |
the degrees of freedom of the B-spline that describes u(T) (see ‘Details’). |
degree |
the degree of the polynomial of the B-spline that describes u(T). |
knots |
the internal knots of the B-spline that describes u(T). |
maxit |
maximum number of iterations of the Newton-Raphson algorithm. If missing, a default value based on the number of free parameters is used. |
tol |
tolerance for the Newton-Raphson algorithm. |
This function fits a flexible parametric model to possibly right-censored, left-truncated outcomes, usually survival data. Right censoring occurs when instead of some outcome T, one can only observe Y = min(T,C) and d = I(T ≤ C), the indicator of failure. Left truncation occurs when Y is only observed subject to Y > Z. Typically, Z is the time at enrollement of subjects with a disease. Those who died before enrollment (Y < Z) cannot be observed, thus generating selection bias.
The formula
must be of the form
Surv(y,d) ~ x
, with censored data;
Surv(z,y,d) ~ x
, with censored, truncated data;
Surv(y) ~ x
, is also allowed and denotes non-censored, non-truncated data.
In the above, x
is a set of predictors, y
is the response variable,
z
truncation times (z < y
), and d
the indicator of failure (1 = event, 0 = censored).
In flexPM
, model fitting is implemented as follows.
First, the response variable is pre-trasformed using a smoothed version of
y = qlogis(rank(y)/(n + 1)). Second, parameter estimation is carried out on the transformed variable.
Maximum likelihood estimators are computed via Newton-Raphson algorithm, using the following flexible distribution:
F(t|x) = 1/(1 + exp{-[u(t) - m(x)]/s(x)}).
In the above, m(x) and log s(x) are modeled as specified by
formula
, while u(.) is a B-spline function built via spline.des
(see bs
).
You can choose the degrees of freedom df and the degree of the spline basis. The model parameters are (a) the coefficients
describing the effect of covariates x on m(x)
and log s(x),
and (b) the coefficients of the B-spline basis that defines the unknown transformation u(.), on which suitable constraints are imposed to ensure monotonicity.
An object of class “flexPM
”, which is a list with the following items:
converged |
logical value indicating the convergence status of the algorithm. |
n.it |
the number of iterations. |
n |
the number of observations. |
n.free.par |
the number of free parameters in the model. |
loglik |
the values of the log-likelihood at convergence. |
AIC, BIC |
the Akaike and Bayesian information criterion. |
mf |
the used model frame. |
call |
the matched call. |
The model parameters are returned as attributes of mf
and are not
user-level objects. The model is intended to be used for prediction and not for inference. The hessian matrix
is not returned. The number of free parameters is df + 2*ncol(x) - 1
,
and not df + 2*ncol(x)
, because the scale of u(.) and that of F(t|x)
are exchangeable and thus one coefficient of u(.) is constrained to be 1.
The accessor functions summary
, nobs
, logLik
, AIC
, and BIC
can be used
to extract information from the model. The fit is only intended for prediction: use predict.flexPM
.
The model is fitted assuming that an unknown transformation of the response variable follows a Logistic distribution. The choice of the “kernel” distribution is only due to computational convenience and does not reflect any prior belief. Provided that u(.) is sufficiently flexible, asymmetric or multi-modal distributions can be fitted. Pre-transforming the response variable (added in flexPM 2.0) removes the outliers, generates a more symmetric distribution, and frequently permits achieving a good fit using fewer knots for u(.).
This flexible parametric approach generally outoperforms fully nonparametric estimators like local Kaplan-Meier, at a cost of a relatively small bias.
Paolo Frumento paolo.frumento@ki.se
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # Simulated data from a normal distribution
n <- 1000
x1 <- rnorm(n)
x2 <- runif(n)
# non-censored, non-truncated data
t <- rnorm(n, 2 + 3*x1, 1 + x2) # time variable
m1 <- flexPM(Surv(t) ~ x1 + x2)
# right-censored data
c <- rnorm(n,3,3) # censoring variable
y <- pmin(t,c) # observed outcome
d <- (t <= c) # 1 = observed, 0 = censored
m2 <- flexPM(Surv(y,d) ~ x1 + x2)
# right-censored, left-truncated data
z <- rnorm(n,-3,3) # truncating variable
w <- which(y > z) # only observe if y > z
y <- y[w]; d <- d[w]; z <- z[w]; x1 <- x1[w]; x2 <- x2[w]
m3 <- flexPM(Surv(z,y,d) ~ x1 + x2)
################################################################
# m1, m2, m3 are not intended to be interpreted.
# Use predict() to obtain predictions.
# Note that the following are identical:
# summary(flexPM(Surv(y) ~ x1 + x2))
# summary(flexPM(Surv(y, rep(1,length(y))) ~ x1 + x2))
# summary(flexPM(Surv(rep(-Inf,length(y)), y, rep(1,length(y))) ~ x1 + x2))
################################################################
# Use the logLik, AIC and BIC for model selection
# (choice of df, inclusion/exclusion of covariates)
models <- list(
flexPM(Surv(z,y,d) ~ x1 + x2, df = 1, degree = 1),
flexPM(Surv(z,y,d) ~ x1 + x2, df = 3),
flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 1, degree = 1),
flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 3),
flexPM(Surv(z,y,d) ~ x1 * x2, df = 5)
)
my_final_model <- models[[which.min(sapply(models, function(x) x$AIC))]]
summary(my_final_model)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.