survPen: (Excess) hazard model with (multidimensional) penalized...
In survPen: Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

survPen

R Documentation

(Excess) hazard model with (multidimensional) penalized splines and integrated smoothness estimation

Description

Please have a look to survival_analysis_with_survPen vignette for a thorough description.

Fits an (excess) hazard model with (multidimensional) penalized splines allowing for time-dependent effects, non-linear effects and interactions between several continuous covariates. The linear predictor is specified on the logarithm of the (excess) hazard. Smooth terms are represented using cubic regression splines with associated quadratic penalties. For multidimensional smooths, tensor product splines or tensor product interactions are available. Smoothness is estimated automatically by optimizing one of two criteria: Laplace approximate marginal likelihood (LAML) or likelihood cross-validation (LCV). When specifying the model's formula, no distinction is made between the part relative to the form of the baseline hazard and the one relative to the effects of the covariates. Thus, time-dependent effects are naturally specified as interactions with some function of time via "*" or ":". See the examples below for more details. The main functions of the survPen package are survPen, smf, tensor, tint and rd. The first one fits the model while the other four are constructors for penalized splines.

The user must be aware that the survPen package does not depend on mgcv. Thus, all the functionalities available in mgcv in terms of types of splines (such as thin plate regression splines or P-splines) are not available in survPen (yet).

Usage

survPen(
  formula,
  data,
  t1,
  t0 = NULL,
  event,
  expected = NULL,
  lambda = NULL,
  rho.ini = NULL,
  max.it.beta = 200,
  max.it.rho = 30,
  beta.ini = NULL,
  detail.rho = FALSE,
  detail.beta = FALSE,
  n.legendre = NULL,
  method = "LAML",
  tol.beta = 1e-04,
  tol.rho = 1e-04,
  step.max = 5,
  type = "overall",
  cluster = NULL
)

Arguments

`formula`	formula object specifying the model. Penalized terms are specified using `smf` (comparable to `s(...,bs="cr")` in `mgcv`), `tensor` (comparable to `te(...,bs="cr")` in `mgcv`), `tint` (comparable to `ti(...,bs="cr")` in `mgcv`), or `rd` (comparable to `s(...,bs="re")` in `mgcv`).
`data`	an optional data frame containing the variables in the model
`t1`	vector of follow-up times or name of the column in `data` containing follow-up times
`t0`	vector of origin times or name of the column in `data` containing origin times; allows to take into account left truncation; default is NULL, in which case it will be a vector of zeroes
`event`	vector of right-censoring indicators or name of the column in `data` containing right-censoring indicators; 1 if the event occurred and 0 otherwise
`expected`	(for net survival only) vector of expected hazard or name of the column in `data` containing expected hazard; default is NULL, in which case overall survival will be estimated
`lambda`	vector of smoothing parameters; default is NULL when it is to be estimated by LAML or LCV
`rho.ini`	vector of initial log smoothing parameters; default is NULL, in which case every initial log lambda will be -1
`max.it.beta`	maximum number of iterations to reach convergence in the regression parameters; default is 200
`max.it.rho`	maximum number of iterations to reach convergence in the smoothing parameters; default is 30
`beta.ini`	vector of initial regression parameters; default is NULL, in which case the first beta will be `log(sum(event)/sum(t1))` and the others will be zero (except if there are "by" variables or if there is a piecewise constant hazard specification in which cases all betas are set to zero)
`detail.rho`	if TRUE, details concerning the optimization process in the smoothing parameters are displayed; default is FALSE
`detail.beta`	if TRUE, details concerning the optimization process in the regression parameters are displayed; default is FALSE
`n.legendre`	number of Gauss-Legendre quadrature nodes to be used to compute the cumulative hazard; default is NULL. If not supplied the value is set to 20 for (excess) hazard models and 10 for relative mortality ratio models
`method`	criterion used to select the smoothing parameters. Should be "LAML" or "LCV"; default is "LAML"
`tol.beta`	convergence tolerance for regression parameters; default is `1e-04`. See `NR.beta` for details
`tol.rho`	convergence tolerance for smoothing parameters; default is `1e-04`. See `NR.rho` for details
`step.max`	maximum absolute value possible for any component of the step vector (on the log smoothing parameter scale) in LCV or LAML optimization; default is 5. If necessary, consider lowering this value to achieve convergence
`type`	should be either 'overall' for hazard regression, 'net' for excess hazard regression, or 'mult' for relative mortality ratio regression
`cluster`	cluster variable for marginal hazard (intensity) models

Details

In time-to-event analysis, we may deal with one or several continuous covariates whose functional forms, time-dependent effects and interaction structure are challenging. One possible way to deal with these effects and interactions is to use the classical approximation of the survival likelihood by a Poisson likelihood. Thus, by artificially splitting the data, the package mgcv can then be used to fit penalized hazard models (Remontet et al. 2018). The problem with this option is that the setup is rather complex and the method can fail with huge datasets (before splitting). Wood et al. (2016) provided a general penalized framework that made available smooth function estimation to a wide variety of models. They proposed to estimate smoothing parameters by maximizing a Laplace approximate marginal likelihood (LAML) criterion and demonstrate how statistical consistency is maintained by doing so. The survPen function implements the framework described by Wood et al. (2016) for modelling time-to-event data without requiring data splitting and Poisson likelihood approximation. The effects of continuous covariates are represented using low rank spline bases with associated quadratic penalties. The survPen function allows to account simultaneously for time-dependent effects, non-linear effects and interactions between several continuous covariates without the need to build a possibly demanding model-selection procedure. Besides LAML, a likelihood cross-validation (LCV) criterion (O Sullivan 1988) can be used for smoothing parameter estimation. First and second derivatives of LCV with respect to the smoothing parameters are implemented so that LCV optimization is computationally equivalent to the LAML optimization proposed by Wood et al. (2016). In practice, LAML optimization is generally both a bit faster and a bit more stable so it is used as default. For m covariates (x_1,\ldots,x_m), if we note h(t,x_1,\ldots,x_m) the hazard at time t, the hazard model is the following :

log[h(t,x_1,\ldots,x_m)]=\sum_j g_j(t,x_1,\ldots,x_m)

where each g_j is either the marginal basis of a specific covariate or a tensor product smooth of any number of covariates. The marginal bases of the covariates are represented as natural (or restricted) cubic splines (as in function ns from library splines) with associated quadratic penalties. Full parametric (unpenalized) terms for the effects of covariates are also possible (see the examples below). Each g_j is then associated with zero, one or several smoothing parameters. The estimation procedure is based on outer Newton-Raphson iterations for the smoothing parameters and on inner Newton-Raphson iterations for the regression parameters (see Wood et al. 2016). Estimation of the regression parameters in the inner algorithm is by direct maximization of the penalized likelihood of the survival model, therefore avoiding data augmentation and Poisson likelihood approximation. The cumulative hazard included in the log-likelihood is approximated by Gauss-Legendre quadrature for numerical stability.

Value

Object of class "survPen" (see survPenObject for details)

by variables

The smf, tensor and tint terms used to specify smooths accept an argument by. This by argument allows for building varying-coefficient models i.e. for letting smooths interact with factors or parametric terms. If a by variable is numeric, then its ith element multiples the ith row of the model matrix corresponding to the smooth term concerned. If a by variable is a factor then it generates an indicator vector for each level of the factor, unless it is an ordered factor. In the non-ordered case, the model matrix for the smooth term is then replicated for each factor level, and each copy has its rows multiplied by the corresponding rows of its indicator variable. The smoothness penalties are also duplicated for each factor level. In short a different smooth is generated for each factor level. The main interest of by variables over separated models is the same.rho argument (for smf, tensor and tint) which allows forcing all smooths to have the same smoothing parameter(s). Ordered by variables are handled in the same way, except that no smooth is generated for the first level of the ordered factor. This is useful if you are interested in differences from a reference level.

See the survival_analysis_with_survPen vignette for more details.

Random effects

i.i.d random effects can be specified using penalization. Indeed, the ridge penalty is equivalent to an assumption that the regression parameters are i.i.d. normal random effects. Thus, it is easy to fit a frailty hazard model. For example, consider the model term rd(clust) which will result in a model matrix component corresponding to model.matrix(~clust-1) being added to the model matrix for the whole model. The associated regression parameters are assumed i.i.d. normal, with unknown variance (to be estimated). This assumption is equivalent to an identity penalty matrix (i.e. a ridge penalty) on the regression parameters. The unknown smoothing parameter \lambda associated with the term rd(clust) is directly linked to the unknown variance \sigma^2: \sigma^2 = \frac{1}{\lambda * S.scale}. Then, the estimated log standard deviation is: log(\hat{\sigma})=-0.5*log(\hat{\lambda})-0.5*log(S.scale). And the estimated variance of the log standard deviation is: Var[log(\hat{\sigma})]=0.25*Var[log(\hat{\lambda})]=0.25*inv.Hess.rho. See the survival_analysis_with_survPen vignette for more details.

This approach allows implementing commonly used random effect structures. For example if g is a factor then rd(g) produces a random parameter for each level of g, the random parameters being i.i.d. normal. If g is a factor and x is numeric, then rd(g,x) produces an i.i.d. normal random slope relating the response to x for each level of g. Thus, random effects treated as penalized splines allow specifying frailty (excess) hazard models (Charvat et al. 2016). For each individual i from cluster (usually geographical unit) j, a possible model would be:

log[h(t_{ij},x_{ij1},\ldots,x_{ijm})]=\sum_k g_k(t_{ij},x_{ij1},\ldots,x_{ijm}) + w_j

where w_j follows a normal distribution with mean 0. The random effect associated with the cluster variable is specified with the model term rd(cluster). We could also specify a random effect depending on age for example with the model term rd(cluster,age). u_j = exp(w_j) is known as the shared frailty.

See the survival_analysis_with_survPen vignette for more details.

Excess hazard model

When studying the survival of patients who suffer from a common pathology we may be interested in the concept of excess mortality that represents the mortality due to that pathology. For example, in cancer epidemiology, individuals may die from cancer or from another cause. The problem is that the cause of death is often either unavailable or unreliable. Supposing that the mortality due to other causes may be obtained from the total mortality of the general population (called expected mortality for cancer patients), we can define the concept of excess mortality. The excess mortality is directly linked to the concept of net survival, which would be the observed survival if patients could not die from other causes. Therefore, when such competing events are present, one may choose to fit an excess hazard model instead of a classical hazard model. Flexible excess hazard models have already been proposed (for examples see Remontet et al. 2007, Charvat et al. 2016) but none of them deals with a penalized framework (in a non-fully Bayesian setting). Excess mortality can be estimated supposing that, in patients suffering from a common pathology, mortality due to others causes than the pathology can be obtained from the (all cause) mortality of the general population; the latter is referred to as the expected mortality h_P. The mortality observed in the patients (h_O) is actually decomposed as the sum of h_P and the excess mortality due to the pathology (h_E). This may be written as:

h_O(t,x)=h_E(t,x)+h_P(a+t,z)

In that equation, t is the time since cancer diagnosis, a is the age at diagnosis, h_P is the mortality of the general population at age a+t given demographical characteristics z (h_P is considered known and available from national statistics), and x a vector of variables that may have an effect on h_E. Including the age in the model is necessary in order to deal with the informative censoring due to other causes of death. Thus, for m covariates (x_1,\ldots,x_m), if we note h_E(t,x_1,\ldots,x_m) the excess hazard at time t, the excess hazard model is the following:

log[h_E(t,x_1,\ldots,x_m)]=\sum_j g_j(t,x_1,\ldots,x_m)

Relative mortality ratio model

Another important feature of the survPen package is that it allows fitting penalized relative mortality ratio models.

As we discussed above, the excess mortality setting considers that the mortality (all causes) observed in the patients (h_O) is actually decomposed as the sum of the expected mortality h_P and the excess mortality due to the pathology (h_E).

This may be written as:

h_O(t,x)=h_E(t,x)+h_P(a+t,z)

One limitation of such a decomposition is that h_E is considered positive. Indeed, sometimes this assumption is not met. For example, in prostate cancer patients with low stages at diagnosis, we observe an 'undermortality' due to selection effects and better overall medical care. In that case, the excess mortality is actually neagtive and the net survival setting fails to describe the reality of those patients. Besides, the excess mortality setting considers the studied disease as an independent cause of death (conditionally on the covariates) compared to the other causes. This point of view is not usely considered in multiple sclerosis epidemiology for example, where the disease is seen as a comorbidity impacting all pre- existing causes of death. In that case, the observed hazard is decomposed as product of population hazard and a relative mortality ratio r

This may be written as:

h_O(t,x)=r(t,x)*h_P(a+t,z)

This decomposition was first proposed in a modelling framework by Andersen et al. (1985). However Andersen's model was a non-flexible semi-parametric model.

The survPen package allows modelling the relative mortality ratio r as a multidimensional function of time and covariates. For m covariates (x_1,\ldots,x_m), if we note r(t,x_1,\ldots,x_m) the relative mortality ratio at time t, the model is as follows:

log[r(t,x_1,\ldots,x_m)]=\sum_j g_j(t,x_1,\ldots,x_m)

Where the g_j functions may be penalized unidimensional or penalized tensor product splines. All features described for the (excess) hazard setting still apply when fitting a relative mortality ratio model. One difference lies in the predictions. With a fitted relative mortality ratio model, you can only retrieve the relative mortality ratio and cumulative relative mortality ratio predictions (with CIs), as well as the ratios of realtive mortality ratio (with type='HR'). No survival prediction (let alone survival difference) will be directly available because its calculation depends on expected mortality rates.

Finally, one important difference between an excess hazard model and relative mortality ratio model is data preparation. For an excess hazard model we only need individual data with expected mortality rate at the time of death. Whereas in a relative mortality ratio model, the contribution to an individual to the likelihood requires all possible expected mortality rate values during the entire follow-up. Therefore, since the expected mortality rates come from national mortality tables usually available in 1-year intervals, we need to split the original dataset as many times as there are 1-year intervals during each individual's follow-up. The function splitmult will help you getting the splitdataset from the original one.

See the survival_analysis_with_survPen vignette for more details and an example of analysis.

Marginal hazard (intensity) models with robust standard errors

In presence of correlated time-to-event data (for example recurrent event data), robust standard errors accounting for said correlation need to be derived. The 'survPen' package allows deriving such robust standard errors based on sandwich estimators (often called Huber sandwich estimator, see also Coz et al. submitted to Biostatistics, for an example in the recurrent event setting).

The user only needs to specify the 'cluster' variable defining the statistical units for which repeated observations are available. This specification is performed via the 'cluster' argument.

See the survival_analysis_with_survPen vignette for more details and an example of analysis.

Convergence

No convergence indicator is given. If the function returns an object of class survPen, it means that the algorithm has converged. If convergence issues occur, an error message is displayed. If convergence issues occur, do not refrain to use detail.rho and/or detail.beta to see exactly what is going on in the optimization process. To achieve convergence, consider lowering step.max and/or changing rho.ini and beta.ini. If your excess hazard model fails to converge, consider fitting a hazard model and use its estimated parameters as initial values for the excess hazard model. Finally, do not refrain to change the "method" argument (LCV or LAML) if convergence issues occur.

Other

Be aware that all character variables are transformed to factors before fitting.

References

Andersen, P. K., Borch-Johnsen, K., Deckert, T., Green, A., Hougaard, P., Keiding, N., and Kreiner, S. (1985). A Cox regression model for the relative mortality and its application to diabetes mellitus survival data. Biometrics, 921-932.

Charvat, H., Remontet, L., Bossard, N., Roche, L., Dejardin, O., Rachet, B., ... and Belot, A. (2016), A multilevel excess hazard model to estimate net survival on hierarchical data allowing for non linear and non proportional effects of covariates. Statistics in medicine, 35(18), 3066-3084.

Coz, E., Charvat, H., Maucort-Boulch, D., and Fauvernier, M. (submitted to Biostatistics). Flexible penalized marginal intensity models for recurrent event data. Fauvernier, M., Roche, L., Uhry, Z., Tron, L., Bossard, N., Remontet, L. and the CENSUR Working Survival Group. Multidimensional penalized hazard model with continuous covariates: applications for studying trends and social inequalities in cancer survival, in revision in the Journal of the Royal Statistical Society, series C.

O Sullivan, F. (1988), Fast computation of fully automated log-density and log-hazard estimators. SIAM Journal on scientific and statistical computing, 9(2), 363-379.

Remontet, L., Bossard, N., Belot, A., & Esteve, J. (2007), An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies. Statistics in medicine, 26(10), 2214-2228.

Remontet, L., Uhry, Z., Bossard, N., Iwaz, J., Belot, A., Danieli, C., Charvat, H., Roche, L. and CENSUR Working Survival Group (2018) Flexible and structured survival model for a simultaneous estimation of non-linear and non-proportional effects and complex interactions between continuous variables: Performance of this multidimensional penalized spline approach in net survival trend analysis. Stat Methods Med Res. 2018 Jan 1:962280218779408. doi: 10.1177/0962280218779408. [Epub ahead of print].

Wood, S.N., Pya, N. and Saefken, B. (2016), Smoothing parameter and model selection for general smooth models (with discussion). Journal of the American Statistical Association 111, 1548-1575

Examples




library(survPen)
data(datCancer) # simulated dataset with 2000 individuals diagnosed with cervical cancer

#-------------------------------------------------------- example 0
# Comparison between restricted cubic splines and penalized restricted cubic splines

library(splines)

# unpenalized
f <- ~ns(fu,knots=c(0.25, 0.5, 1, 2, 4),Boundary.knots=c(0,5))

mod <- survPen(f,data=datCancer,t1=fu,event=dead)

# penalized
f.pen <- ~ smf(fu,knots=c(0,0.25, 0.5, 1, 2, 4,5)) # careful here: the boundary knots are included

mod.pen <- survPen(f.pen,data=datCancer,t1=fu,event=dead)

# predictions

new.time <- seq(0,5,length=100)
pred <- predict(mod,data.frame(fu=new.time))
pred.pen <- predict(mod.pen,data.frame(fu=new.time))

par(mfrow=c(1,1))
plot(new.time,pred$haz,type="l",ylim=c(0,0.2),main="hazard vs time",
xlab="time since diagnosis (years)",ylab="hazard",col="red")
lines(new.time,pred.pen$haz,col="blue3")
legend("topright",legend=c("unpenalized","penalized"),
col=c("red","blue3"),lty=rep(1,2))



#-------------------------------------------------------- example 1
# hazard models with unpenalized formulas compared to a penalized tensor product smooth

library(survPen)
data(datCancer) # simulated dataset with 2000 individuals diagnosed with cervical cancer

# constant hazard model
f.cst <- ~1
mod.cst <- survPen(f.cst,data=datCancer,t1=fu,event=dead)

# piecewise constant hazard model
f.pwcst <- ~pwcst(breaks=seq(0,5,by=0.5))
mod.pwcst <- survPen(f.pwcst,data=datCancer,t1=fu,event=dead)

# linear effect of time
f.lin <- ~fu
mod.lin <- survPen(f.lin,data=datCancer,t1=fu,event=dead)

# linear effect of time and age with proportional effect of age
f.lin.age <- ~fu+age
mod.lin.age <- survPen(f.lin.age,data=datCancer,t1=fu,event=dead)

# linear effect of time and age with time-dependent effect of age (linear)
f.lin.inter.age <- ~fu*age
mod.lin.inter.age <- survPen(f.lin.inter.age,data=datCancer,t1=fu,event=dead)

# cubic B-spline of time with a knot at 1 year, linear effect of age and time-dependent effect
# of age with a quadratic B-spline of time with a knot at 1 year
library(splines)
f.spline.inter.age <- ~bs(fu,knots=c(1),Boundary.knots=c(0,5))+age+
age:bs(fu,knots=c(1),Boundary.knots=c(0,5),degree=2)
# here, bs indicates an unpenalized cubic spline

mod.spline.inter.age <- survPen(f.spline.inter.age,data=datCancer,t1=fu,event=dead)


# tensor of time and age
f.tensor <- ~tensor(fu,age)
mod.tensor <- survPen(f.tensor,data=datCancer,t1=fu,event=dead)


# predictions of the models at age 60

new.time <- seq(0,5,length=100)
pred.cst <- predict(mod.cst,data.frame(fu=new.time))
pred.pwcst <- predict(mod.pwcst,data.frame(fu=new.time))
pred.lin <- predict(mod.lin,data.frame(fu=new.time))
pred.lin.age <- predict(mod.lin.age,data.frame(fu=new.time,age=60))
pred.lin.inter.age <- predict(mod.lin.inter.age,data.frame(fu=new.time,age=60))
pred.spline.inter.age <- predict(mod.spline.inter.age,data.frame(fu=new.time,age=60))
pred.tensor <- predict(mod.tensor,data.frame(fu=new.time,age=60))

lwd1 <- 2

par(mfrow=c(1,1))
plot(new.time,pred.cst$haz,type="l",ylim=c(0,0.2),main="hazard vs time",
xlab="time since diagnosis (years)",ylab="hazard",col="blue3",lwd=lwd1)
segments(x0=new.time[1:99],x1=new.time[2:100],y0=pred.pwcst$haz[1:99],col="lightblue2",lwd=lwd1)
lines(new.time,pred.lin$haz,col="green3",lwd=lwd1)
lines(new.time,pred.lin.age$haz,col="yellow",lwd=lwd1)
lines(new.time,pred.lin.inter.age$haz,col="orange",lwd=lwd1)
lines(new.time,pred.spline.inter.age$haz,col="red",lwd=lwd1)
lines(new.time,pred.tensor$haz,col="black",lwd=lwd1)
legend("topright",
legend=c("cst","pwcst","lin","lin.age","lin.inter.age","spline.inter.age","tensor"),
col=c("blue3","lightblue2","green3","yellow","orange","red","black"),
lty=rep(1,7),lwd=rep(lwd1,7))


# you can also calculate the hazard yourself with the lpmatrix option.
# For example, compare the following predictions:
haz.tensor <- pred.tensor$haz

X.tensor <- predict(mod.tensor,data.frame(fu=new.time,age=60),type="lpmatrix")
haz.tensor.lpmatrix <- exp(X.tensor%mult%mod.tensor$coefficients)

summary(haz.tensor.lpmatrix - haz.tensor)

#---------------- The 95% confidence intervals can be calculated like this:

# standard errors from the Bayesian covariance matrix Vp
std <- sqrt(rowSums((X.tensor%mult%mod.tensor$Vp)*X.tensor))

qt.norm <- stats::qnorm(1-(1-0.95)/2)
haz.inf <- as.vector(exp(X.tensor%mult%mod.tensor$coefficients-qt.norm*std))
haz.sup <- as.vector(exp(X.tensor%mult%mod.tensor$coefficients+qt.norm*std))

# checking that they are similar to the ones given by the predict function
summary(haz.inf - pred.tensor$haz.inf)
summary(haz.sup - pred.tensor$haz.sup)


#-------------------------------------------------------- example 2

library(survPen)
data(datCancer) # simulated dataset with 2000 individuals diagnosed with cervical cancer

# model : unidimensional penalized spline for time since diagnosis with 5 knots
f1 <- ~smf(fu,df=5)
# when knots are not specified, quantiles are used. For example, for the term "smf(x,df=df1)",
# the vector of knots will be: quantile(unique(x),seq(0,1,length=df1)) 

# you can specify your own knots if you want
# f1 <- ~smf(fu,knots=c(0,1,3,6,8))

# hazard model
mod1 <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=NULL,method="LAML")
summary(mod1)

# to see where the knots were placed
mod1$list.smf

# with LCV instead of LAML
mod1bis <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=NULL,method="LCV")
summary(mod1bis)

# hazard model taking into account left truncation (not representative of cancer data, 
# the begin variable was simulated for illustration purposes only)
mod2 <- survPen(f1,data=datCancer,t0=begin,t1=fu,event=dead,expected=NULL,method="LAML")
summary(mod2)

# excess hazard model
mod3 <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=rate,method="LAML")
summary(mod3)

# compare the predictions of the models
new.time <- seq(0,5,length=50)
pred1 <- predict(mod1,data.frame(fu=new.time))
pred1bis <- predict(mod1bis,data.frame(fu=new.time))
pred2 <- predict(mod2,data.frame(fu=new.time))
pred3 <- predict(mod3,data.frame(fu=new.time))

# LAML vs LCV
par(mfrow=c(1,2))
plot(new.time,pred1$haz,type="l",ylim=c(0,0.2),main="LCV vs LAML",
xlab="time since diagnosis (years)",ylab="hazard")
lines(new.time,pred1bis$haz,col="blue3")
legend("topright",legend=c("LAML","LCV"),col=c("black","blue3"),lty=c(1,1))

plot(new.time,pred1$surv,type="l",ylim=c(0,1),main="LCV vs LAML",
xlab="time since diagnosis (years)",ylab="survival")
lines(new.time,pred1bis$surv,col="blue3")



# hazard vs excess hazard
par(mfrow=c(1,2))
plot(new.time,pred1$haz,type="l",ylim=c(0,0.2),main="hazard vs excess hazard",
xlab="time since diagnosis (years)",ylab="hazard")
lines(new.time,pred3$haz,col="green3")
legend("topright",legend=c("overall","excess"),col=c("black","green3"),lty=c(1,1))

plot(new.time,pred1$surv,type="l",ylim=c(0,1),main="survival vs net survival",
xlab="time",ylab="survival")
lines(new.time,pred3$surv,col="green3")
legend("topright",legend=c("overall survival","net survival"), col=c("black","green3"), lty=c(1,1)) 

# hazard vs excess hazard with 95% Bayesian confidence intervals (based on Vp matrix, 
# see predict.survPen)
par(mfrow=c(1,1))
plot(new.time,pred1$haz,type="l",ylim=c(0,0.2),main="hazard vs excess hazard",
xlab="time since diagnosis (years)",ylab="hazard")
lines(new.time,pred3$haz,col="green3")
legend("topright",legend=c("overall","excess"),col=c("black","green3"),lty=c(1,1))

lines(new.time,pred1$haz.inf,lty=2)
lines(new.time,pred1$haz.sup,lty=2)

lines(new.time,pred3$haz.inf,lty=2,col="green3")
lines(new.time,pred3$haz.sup,lty=2,col="green3")



#-------------------------------------------------------- example 3

library(survPen)
data(datCancer) # simulated dataset with 2000 individuals diagnosed with cervical cancer

# models: tensor product smooth vs tensor product interaction of time since diagnosis and 
# age at diagnosis. Smoothing parameters are estimated via LAML maximization
f2 <- ~tensor(fu,age,df=c(5,5))

f3 <- ~tint(fu,df=5)+tint(age,df=5)+tint(fu,age,df=c(5,5))

# hazard model
mod4 <- survPen(f2,data=datCancer,t1=fu,event=dead)
summary(mod4)

mod5 <- survPen(f3,data=datCancer,t1=fu,event=dead)
summary(mod5)

# predictions
new.age <- seq(50,90,length=50)
new.time <- seq(0,7,length=50)

Z4 <- outer(new.time,new.age,function(t,a) predict(mod4,data.frame(fu=t,age=a))$haz)
Z5 <- outer(new.time,new.age,function(t,a) predict(mod5,data.frame(fu=t,age=a))$haz)

# color settings
col.pal <- colorRampPalette(c("white", "red"))
colors <- col.pal(100)

facet <- function(z){

	facet.center <- (z[-1, -1] + z[-1, -ncol(z)] + z[-nrow(z), -1] + z[-nrow(z), -ncol(z)])/4
	cut(facet.center, 100)
	
}

# plot the hazard surfaces for both models
par(mfrow=c(1,2))
persp(new.time,new.age,Z4,col=colors[facet(Z4)],main="tensor",theta=30,
xlab="time since diagnosis",ylab="age at diagnosis",zlab="excess hazard",ticktype="detailed")
persp(new.time,new.age,Z5,col=colors[facet(Z5)],main="tint",theta=30,
xlab="time since diagnosis",ylab="age at diagnosis",zlab="excess hazard",ticktype="detailed")

#-------------------------------------------------------- example 4

library(survPen)
data(datCancer) # simulated dataset with 2000 individuals diagnosed with cervical cancer

# model : tensor product spline for time, age and yod (year of diagnosis)
# yod is not centered here since it does not create unstability but be careful in practice
# and consider centering your covariates if you encounter convergence issues
f4 <- ~tensor(fu,age,yod,df=c(5,5,5))

# excess hazard model
mod6 <- survPen(f4,data=datCancer,t1=fu,event=dead,expected=rate)
summary(mod6)


# predictions of the surfaces for ages 50, 60, 70 and 80
new.year <- seq(1990,2010,length=30)
new.time <- seq(0,5,length=50)

Z_50 <- outer(new.time,new.year,function(t,y) predict(mod6,data.frame(fu=t,yod=y,age=50))$haz)
Z_60 <- outer(new.time,new.year,function(t,y) predict(mod6,data.frame(fu=t,yod=y,age=60))$haz)
Z_70 <- outer(new.time,new.year,function(t,y) predict(mod6,data.frame(fu=t,yod=y,age=70))$haz)
Z_80 <- outer(new.time,new.year,function(t,y) predict(mod6,data.frame(fu=t,yod=y,age=80))$haz)


# plot the hazard surfaces for a given age
par(mfrow=c(2,2))
persp(new.time,new.year,Z_50,col=colors[facet(Z_50)],main="age 50",theta=20,
xlab="time since diagnosis",ylab="yod",zlab="excess hazard",ticktype="detailed")
persp(new.time,new.year,Z_60,col=colors[facet(Z_60)],main="age 60",theta=20,
xlab="time since diagnosis",ylab="yod",zlab="excess hazard",ticktype="detailed")
persp(new.time,new.year,Z_70,col=colors[facet(Z_70)],main="age 70",theta=20,
xlab="time since diagnosis",ylab="yod",zlab="excess hazard",ticktype="detailed")
persp(new.time,new.year,Z_80,col=colors[facet(Z_80)],main="age 80",theta=20,
xlab="time since diagnosis",ylab="yod",zlab="excess hazard",ticktype="detailed")

########################################

survPen documentation built on April 4, 2025, 3:06 a.m.

survPen index

Survival analysis with survPen

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

survPen
Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

survPen: (Excess) hazard model with (multidimensional) penalized...
In survPen: Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

(Excess) hazard model with (multidimensional) penalized splines and integrated smoothness estimation

Description

Usage

Arguments

Details

Value

by variables

Random effects

Excess hazard model

Relative mortality ratio model

Marginal hazard (intensity) models with robust standard errors

Convergence

Other

References

Examples

Related to survPen in survPen...

R Package Documentation

Browse R Packages

We want your feedback!

survPen Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

survPen: (Excess) hazard model with (multidimensional) penalized... In survPen: Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

(Excess) hazard model with (multidimensional) penalized splines and integrated smoothness estimation

Description

Usage

Arguments

Details

Value

by variables

Random effects

Excess hazard model

Relative mortality ratio model

Marginal hazard (intensity) models with robust standard errors

Convergence

Other

References

Examples

Related to survPen in survPen...

R Package Documentation

Browse R Packages

We want your feedback!

survPen
Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models

survPen: (Excess) hazard model with (multidimensional) penalized...
In survPen: Multidimensional Penalized Splines for (Excess) Hazard Models, Relative Mortality Ratio Models and Marginal Intensity Models