Fit a flexible parametric regression model to possibly right-censored, left-truncated data.

1 |

`formula` |
an object of class “ |

`data` |
an optional data frame, list or environment (or object coercible by |

`weights` |
optional weights to be used during the fitting process. |

`df` |
the degrees of freedom of the B-spline that describes |

`degree` |
the degree of the polynomial of the B-spline that describes |

`knots` |
the |

`maxit` |
maximum number of iterations of the Newton-Raphson algorithm. If missing, a default value based on the number of free parameters is used. |

`tol` |
tolerance for the Newton-Raphson algorithm. |

This function fits a flexible parametric model to possibly right-censored, left-truncated outcomes,
usually survival data. Right censoring occurs when instead of some outcome *T*, one can only observe
*Y = min(T,C)* and *d = I(T ≤ C)*, the indicator of failure.
Left truncation occurs when *Y* is only observed subject to *Y > Z*.
Typically, *Z* is the time at enrollement of subjects with a disease. Those who died
before enrollment (*Y < Z*) cannot be observed, thus generating selection bias.

The `formula`

must be of the form

`Surv(y,d) ~ x`

, with censored data;`Surv(z,y,d) ~ x`

, with censored, truncated data;`Surv(y) ~ x`

, is also allowed and denotes non-censored, non-truncated data.

In the above, `x`

is a set of predictors, `y`

is the response variable,
`z`

truncation times (`z < y`

), and `d`

the indicator of failure (1 = event, 0 = censored).

In `flexPM`

, model fitting is implemented as follows.
First, the response variable is pre-trasformed using a smoothed version of
`y = qlogis(rank(y)/(n + 1))`. Second, parameter estimation is carried out on the transformed variable.
Maximum likelihood estimators are computed via Newton-Raphson algorithm, using the following flexible distribution:

*F(t|x) = 1/(1 + exp{-[u(t) - m(x)]/s(x)}).*

In the above, *m(x)* and *log s(x)* are modeled as specified by
`formula`

, while *u(.)* is a B-spline function built via `spline.des`

(see `bs`

).
You can choose the degrees of freedom `df` and the `degree` of the spline basis. The model parameters are (a) the coefficients
describing the effect of covariates *x* on *m(x)*
and *log s(x)*,
and (b) the coefficients of the B-spline basis that defines the unknown transformation *u(.)*, on which suitable constraints are imposed to ensure monotonicity.

An object of class “`flexPM`

”, which is a list with the following items:

`converged` |
logical value indicating the convergence status of the algorithm. |

`n.it` |
the number of iterations. |

`n` |
the number of observations. |

`n.free.par` |
the number of free parameters in the model. |

`loglik` |
the values of the log-likelihood at convergence. |

`AIC, BIC` |
the Akaike and Bayesian information criterion. |

`mf` |
the used model frame. |

`call` |
the matched call. |

The model parameters are returned as attributes of `mf`

and are not
user-level objects. The model is intended to be used for prediction and not for inference. The hessian matrix
is not returned. The number of free parameters is `df + 2*ncol(x) - 1`

,
and not `df + 2*ncol(x)`

, because the scale of *u(.)* and that of *F(t|x)*
are exchangeable and thus one coefficient of *u(.)* is constrained to be `1`.

The accessor functions `summary`

, `nobs`

, `logLik`

, `AIC`

, and `BIC`

can be used
to extract information from the model. The fit is only intended for prediction: use `predict.flexPM`

.

The model is fitted assuming that an unknown transformation of the response
variable follows a Logistic distribution. The choice of the “kernel” distribution is only due to computational convenience and does not reflect any prior belief. Provided that *u(.)* is sufficiently flexible, asymmetric or multi-modal distributions can be fitted. Pre-transforming the response variable (added in flexPM 2.0) removes the outliers, generates a more symmetric distribution, and frequently permits achieving a good fit using fewer knots for *u(.)*.

This flexible parametric approach generally outoperforms fully nonparametric estimators like local Kaplan-Meier, at a cost of a relatively small bias.

Paolo Frumento paolo.frumento@ki.se

`predict.flexPM`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ```
# Simulated data from a normal distribution
n <- 1000
x1 <- rnorm(n)
x2 <- runif(n)
# non-censored, non-truncated data
t <- rnorm(n, 2 + 3*x1, 1 + x2) # time variable
m1 <- flexPM(Surv(t) ~ x1 + x2)
# right-censored data
c <- rnorm(n,3,3) # censoring variable
y <- pmin(t,c) # observed outcome
d <- (t <= c) # 1 = observed, 0 = censored
m2 <- flexPM(Surv(y,d) ~ x1 + x2)
# right-censored, left-truncated data
z <- rnorm(n,-3,3) # truncating variable
w <- which(y > z) # only observe if y > z
y <- y[w]; d <- d[w]; z <- z[w]; x1 <- x1[w]; x2 <- x2[w]
m3 <- flexPM(Surv(z,y,d) ~ x1 + x2)
################################################################
# m1, m2, m3 are not intended to be interpreted.
# Use predict() to obtain predictions.
# Note that the following are identical:
# summary(flexPM(Surv(y) ~ x1 + x2))
# summary(flexPM(Surv(y, rep(1,length(y))) ~ x1 + x2))
# summary(flexPM(Surv(rep(-Inf,length(y)), y, rep(1,length(y))) ~ x1 + x2))
################################################################
# Use the logLik, AIC and BIC for model selection
# (choice of df, inclusion/exclusion of covariates)
models <- list(
flexPM(Surv(z,y,d) ~ x1 + x2, df = 1, degree = 1),
flexPM(Surv(z,y,d) ~ x1 + x2, df = 3),
flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 1, degree = 1),
flexPM(Surv(z,y,d) ~ x1 + x2 + I(x1^2) + I(x2^2), df = 3),
flexPM(Surv(z,y,d) ~ x1 * x2, df = 5)
)
my_final_model <- models[[which.min(sapply(models, function(x) x$AIC))]]
summary(my_final_model)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.