This vignette illustrates performing regression of cumulative incidence probabilities via pseudo-values.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = TRUE )
suppressPackageStartupMessages({ library(dplyr) library(ggplot2) library(SurvUtils) })
Time to the event of interest is simulated from an exponential distribution with rate parameter:
$$ \lambda(A, X_{1}, X_{2}) = \lambda_{0}\exp(A\beta_{A} + X_{1}\beta_{1} + X_{2}\beta_{2}) $$ Here $\lambda_{0}$ is the base event rate, $A \in {0, 1}$ is the treatment arm, and $(X_{1}, X_{2})$ are covariates. The coefficients are set such that treatment $A = 1$ decreases the event rate by 50\%, a one standard deviation increase in $X_{1}$ increases the event rate by 20\%, and $X_{2}$ has no effect. Time to the event of interest is subject to a competing risk from death, with rate $\lambda_{D} = 0.25$, and independent censoring, with rate $\lambda_{C} = 0.25$.
set.seed(101) # Generate covariate frame. n <- 2 * 1e3 df_x <- data.frame( arm = rep(c(0, 1), each = 1e3), x1 = stats::rnorm(n), x2 = stats::rnorm(n) ) # Center the scale the covariates. df_x$x1 <- scale(df_x$x1) df_x$x2 <- scale(df_x$x2) # Simulate data. df <- SurvUtils::GenCRData( base_death_rate = 0.25, base_event_rate = 1.0, beta_event = c(log(0.5), log(1.2), 0), censoring = 0.25, covariates = data.matrix(df_x) ) df <- cbind(df, df_x) # Tabulate censorings, events, and deaths by treatment arm. df %>% dplyr::group_by(arm) %>% dplyr::summarise( n = dplyr::n(), n_censor = sum(status == 0), n_event = sum(status == 1), n_death = sum(status == 2) )
Pseudo-values at a specified time point $\tau$ are generated with the SurvUtils::GenPseudo
function. The pseudo-value is defined as:
$$ \hat{\theta}{i}(\tau) = n \cdot \hat{\theta}(\tau) - (n - 1)\cdot\hat{\theta}{(-i)}(\tau)$$
where $\hat{\theta}{i}(\tau)$ is the pseudo-value for subject $i$ at time $\tau$, $n$ is the sample size, $\hat{\theta}(\tau)$ is the value of the cumulative incidence curve (CIC) at time $\tau$ based on the full sample, and $\hat{\theta}{(-i)}(\tau)$ is the CIC at time $\tau$ based on the jackknifed sample with subject $i$ excluded. Pseudo-values take censoring into account during their construction, and once calculated, can be modeled in the same way as any continuous outcome.
Below, pseudo-values for the CIC at time $\tau = 2$ are modeled via linear regression. Consequently, the coefficients are interpreted as risk differences. For example, the coefficient on arm
indicates that the treatment ($A = 1$) is estimated to reduce the cumulative incidence of the event of interest at time $\tau = 2$ by 17.9\%, holding $(X_{1}, X_{2})$ constant. Meanwhile, a standard deviation increase in $X_{1}$ is associated with a 6.7\% increase in the cumulative incidence of the event of interest at time $\tau = 2$, holding $(A, X_{2})$ constant. As expected, $A$ is associated with a significant reduction in the cumulative incidence, $X_{1}$ is associated with a comparatively smaller but significant increase, and $X_{2}$ has no significant effect.
# Generate pseudo-values for the cumulative incidence. df <- SurvUtils::GenPseudo(df, tau = 2, type = "cic") # Fit a linear pseudo-value regression. fit <- stats::lm(pseudo ~ arm + x1 + x2, data = df) summary(fit)
The generative model was specified in terms of multiplicative effects of the treatment and covariates on the rate parameter for the event of interest, whereas our analysis estimates rate differences at a particular time point. Since the generative model for the data is known, we can determine the ground truth values for the rate differences by simulating a large data set in the absence of censoring, then fitting the pseudo-value regression at the same time point, as demonstrated below. Uncertainty in the ground truth parameter estimates can be made arbitrarily small by making the sample size sufficiently large.
set.seed(101) # Generate covariate frame. n <- 2 * 1e4 df_x <- data.frame( arm = rep(c(0, 1), each = 1e4), x1 = stats::rnorm(n), x2 = stats::rnorm(n) ) # Center the scale the covariates. df_x$x1 <- scale(df_x$x1) df_x$x2 <- scale(df_x$x2) # Simulate data. df <- SurvUtils::GenCRData( base_death_rate = 0.25, base_event_rate = 1.0, beta_event = c(log(0.5), log(1.2), 0), censoring = 0.0, covariates = data.matrix(df_x) ) df <- cbind(df, df_x) # Generate pseudo-values for the cumulative incidence. df <- SurvUtils::GenPseudo(df, tau = 2, type = "cic") # Fit a linear pseudo-value regression. fit <- stats::lm(pseudo ~ arm + x1 + x2, data = df) summary(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.