library(knitr) opts_chunk$set( fig.align = "center", cache = TRUE, message = FALSE, fig.height = 5, fig.width = 5)
library(magrittr) library(dplyr) library(ggplot2) theme_set(theme_bw()) library(pam) library(mgcv) library(survival) Set1 <- RColorBrewer::brewer.pal(9, "Set1")
We first illustrate the estimation on non-linear smooth effects using
the veteran data from the survival package:
data("veteran", package="survival") veteran %<>% mutate( trt = 1+(trt == 2), prior = 1*(prior==10)) %>% filter(time < 400) head(veteran)
There are two continuous covariates in the data set that could potentially
exhibit non-linear effects, age and karno. We fit a Cox-PH model and
a PAM of the form
$$ \lambda(t|x) = \lambda_0(t)\exp(f(x_{karno})) $$
and
$$ \lambda(t|x) = \exp(f_0(t_j) + f(x_{karno})) $$ respectively:
## Cox-PH cph <- coxph(Surv(time, status) ~ pspline(karno, df=0), data=veteran) ## PAM ped <- split_data(Surv(time, status) ~ ., data=veteran, id="id") pam <- gam(ped_status ~ s(tend) + s(karno, bs="ps"), data=ped, family=poisson(), offset=offset)
# crate data set with varying karno values (from min to max in 100 steps) # and add term contribution of karno from PAM and Cox-PH models karno_df <- ped %>% make_newdata(expand = "karno", length.out = 100) %>% add_term(pam, term="karno") %>% mutate( cox = predict(cph, newdata=., type="terms")[, "pspline(karno, df = 0)"] - predict(cph, newdata=data.frame(karno=mean(veteran$karno)))) ggplot(karno_df, aes(x=karno, ymin=low, ymax=high)) + geom_line(aes(y=fit, col="PAM")) + geom_ribbon(alpha=0.2) + geom_line(aes(y=cox, col="Cox PH"))+ scale_colour_manual(name="Method",values=c("Cox PH"=Set1[1],"PAM"=1)) + xlab(expression(x[plain(karno)])) + ylab(expression(hat(f)(x[plain(karno)])))
Both methods are in good agreement. The higher the Karnofsky-Score the lower the
expected log-hazard. For further evaluation of the Karnofsky Score effect using
time-varying terms of the form $f(t)\cdot x_{karno}$ and $f(x_{karno}, t)$ see
vignette("tveffects", package="pam").
In the following we consider another example using using data presented in the
respective vignette in the survival package
(see vignette("splines", package = "survival")).
The example presented in the vignette goes as follows, using the standard
base R workflow and termplot for visualization:
data("mgus", package = "survival") mfit <- coxph(Surv(futime, death) ~ sex + pspline(age, df = 4), data = mgus) mfit
termplot(mfit, term = 2, se = TRUE, col.term = 1, col.se = 1)
The equivalent fit using PAMs requires the additional step of transforming the
data into a suitable format. We then use mgcv::gam to fit
the model:
mgus.ped <- split_data(Surv(futime, death)~sex + age, data = mgus, id = "id") head(mgus.ped) pamfit <- gam(ped_status ~ s(tend) + sex + s(age, bs = "ps"), data = mgus.ped, method = "REML", family = poisson(), offset = offset) summary(pamfit)
For visualization of the smooth effects we can use the default mgcv::plot.gam
function:
layout(matrix(1:2, nrow = 1)) termplot(mfit, term = 2, se = TRUE, col.term = 1, col.se = 1) plot(pamfit, select = 2, se = TRUE, ylim = c(-4, 3.5))
In this example the PAM approach estimates a linear effect of age, which is
consistent with the estimation using coxph, as there is only weak non-linearity.
Another example from the same vignette shows that estimated effects are very similar if the effect of the covariate is in fact strongly non-linear:
fit <- coxph(Surv(futime, death) ~ age + pspline(hgb, 4), mgus2) mgus2.ped <- split_data(Surv(futime, death)~age + hgb, data = mgus2, id = "id") pamfit2 <- gam(ped_status~s(tend) + age + s(hgb), data = mgus2.ped, family = poisson(), offset = offset)
layout(matrix(1:2, nrow = 1)) termplot(fit, term = 2, se = TRUE, col.term = 1, col.se = 1) plot(pamfit2, select = 2, se = TRUE, ylim = c(-0.5, 2))
The vignette in the survival package further discusses enforcing monotonicity
contraints on the effect of hgb. These can be achieved here more easily
using the functionality provided in the scam package (see also @pya2015).
The usage is exactly the same as before, replacing the call to gam with a
call to scam and specifying bs = "mpd". Note that the fit using contraints
takes considerably longer compared to the unconstrained fit.
library(scam) mpam <- scam(ped_status ~ s(tend) + age + s(hgb, bs = "mpd"), data = mgus2.ped, family = poisson(), offset = offset)
layout(matrix(1:2, nrow = 1)) plot(pamfit2, select = 2, se = TRUE, ylim = c(-0.5, 2), main="Unconstrained fit") # 1.72 = intercept difference between pamfit2 and mpam const <- abs(coef(pamfit2)[1] - coef(mpam)[1]) plot(mpam, select = 2, se = TRUE, shift = const, ylim = c(-0.5-const, 2-const), main="Fit with monotonicity constraint")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.