knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The sm_predict()
function calculates kernel-smoothed predictions from regression models (i.e. outputs from the predict()
function). The following examples focus on time-to-event endpoints. The example data set is simulated from a Cox regression model.
$$ h(t) = h_0(t)e^{\textbf{X}\beta} $$
with $h_0(t) = 1$ and $\textbf{X}\beta = 0.02 * age - 0.2 * marker$. The independent variables age and marker are independent from one another. The example data set containts the following columns:
time Years from cancer treatment to death age Age at treatment marker Marker level at treatment survt1 True 1 year survival probability
library(sjosmooth) library(dplyr) # loading data from sjosmooth github page load(url("https://github.com/ddsjoberg/sjosmooth/blob/master/examples/cancertx.rda?raw=true")) cancertx %>% select(time, age, marker, survt1) %>% head() %>% knitr::kable()
The simualted data contains r nrow(cancertx) %>% format(big.mark = ",", scientific=FALSE)
observations.
Kernel smoothing can be computationally intense on large data sets. The following code was run to get the example object sm_regression_ex1
and the results saved to GitHub.
library(survival) library(sjosmooth) sm_predict_ex1 = sm_predict( data = cancertx, method = "coxph", formula = Surv(time) ~ marker, newdata = dplyr::data_frame(marker = seq(5, 15, 0.5), time = 1), type = "survival", verbose = TRUE )
The data in the cancertx
data frame was simulated from a Cox regression model with linear predictor.
$$ \textbf{X}\beta = 0.02 * age - 0.2 * marker $$
In the model age and marker level are independent, and therefore, the univariate model the slope coefficient for marker remains $\beta_{marker} = -0.2$
library(ggplot2) # loading data and example results from sjosmooth github page load(url("https://github.com/ddsjoberg/sjosmooth/blob/master/examples/sm_predict_ex1.rda?raw=true")) ggplot(sm_predict_ex1, aes(x = marker, y = .fitted)) + geom_line() + geom_ribbon(aes(ymin = .fitted.ll, ymax = .fitted.ul), alpha = 0.3) + geom_line( data = cancertx %>% filter(marker >= 5 & marker <= 15), aes(x = marker, y = survt1_marker), linetype = "dashed" ) + labs( y = "1 year survival probability" )
The dashed line is the true one year survival, and the solid line is the estimated survival from sm_predict()
.
In this example, the 1 year survival will be estimated by both age and marker level. The following code was run in advance, and the results saved to GitHub.
sm_predict_ex2 <- sjosmooth::sm_predict( data = cancertx, method = "coxph", formula = Surv(time) ~ marker + age, newdata = list( marker = seq(7, 13, 1), age = seq(20, 40, 2) ) %>% purrr::cross_df() %>% dplyr::mutate(time = 1), type = "survival", verbose = TRUE )
# loading example results load(url("https://github.com/ddsjoberg/sjosmooth/blob/master/examples/sm_predict_ex2.rda?raw=true")) # adding true risk to dataset, and calculating difference between estiamted and true rate sm_predict_ex2 = sm_predict_ex2 %>% mutate( survt1 = exp(-exp((1/50) * age - 0.2 * marker)), surv_diff = survt1 - .fitted ) # plot histogram of differences between fitted and true sm_predict_ex2 %>% ggplot(aes(x = surv_diff)) + geom_histogram(bins = 25) + labs( title = "Difference Distribution", x = "Difference in estimated and true survival" )
The code below will create a 3-D surface plot. The resulting HTML-widgets figures are too large to include in the package, however.
# library(plotly) # # extracting unique values of independent variables # age.seq = # sm_predict_ex2 %>% # select(age) %>% # distinct() %>% # arrange(age) %>% # pull() # # marker.seq = # sm_predict_ex2 %>% # select(marker) %>% # distinct() %>% # arrange(marker) %>% # pull() # # # Plotting the true survival probabilities # sm_predict_ex2 %>% # select(marker, age, survt1) %>% # tidyr::spread(age, survt1) %>% # select(-marker) %>% # plot_ly(z = ~ as.matrix(.), # x = marker.seq, # y = age.seq) %>% # add_surface(showscale=FALSE) %>% # layout( # title = "True Risk of Death within 1 Year", # scene = list( # xaxis = list(title = "Marker"), # yaxis = list(title = "Age"), # zaxis = list(title = "Survival Prob.") # )) # # # Plotting the estimated survival probabilities from sm_predict() # sm_predict_ex2 %>% # select(marker, age, .fitted) %>% # tidyr::spread(age, .fitted) %>% # select(-marker) %>% # plot_ly(z = ~ as.matrix(.), # x = marker.seq, # y = age.seq) %>% # add_surface(showscale=FALSE) %>% # layout( # title = "Estimated Risk of Death within 1 Year", # scene = list( # xaxis = list(title = "Marker"), # yaxis = list(title = "Age"), # zaxis = list(title = "Survival Prob.") # ))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.