In mdbrown/partlyconditional: Partly Conditional Logistic and Cox models

Introduction

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
options(width = 100)

This package provides functions to fit partly conditional (PC) risk models, which are a helpful class of predictive models often used in medical contexts where long-term follow-up is available and interest lies in predicting patients' risks for a future adverse outcome using repeatedly measured predictors over time.

The partlyconditional R package provides model fitting procedures for (medical) contexts where long term follow-up information is available on a patient population along with repeated measures of patient health and other biological markers collected across time. Interest lies in modeling a patients' future risk of an adverse event in the next $t_0$ time interval, given survival to time $s$, as a function of longitudinal marker history $H(s)$:

$$ R(\tau_0 | s, H(s)) = P(T \le s + \tau_0 | T > s, H(s)) $$ Where $T$ is time until the event of interest.

One approach is to use 'partly conditional' models, where risk is specified by first conditioning on the set of known marker history collected until time $s$. This package provides functions to fit two classes of partly conditional models; one based on logistic regression (PC.logistic coming soon), and another which models the conditional hazard of failure using a Cox proportional hazards model (PC.Cox).

PC Cox models

The PC.Cox function fits a partly conditional Cox model of the form:

$$ \lambda(\tau | H(s)) = \lambda_0(\tau) exp(\alpha B(s) + \beta Z + \gamma h(Y)) $$

where $\lambda_0$ is the unkown baseline hazard, $B(s)$ is a spline basis for the time of measurement, $Z$ includes covariate and patient information constant through time, and $h(Y)$ is a function of the marker values, such as last observed marker value or a smoothed marker value (see below). Absolute $\tau$ year estimates of risk conditional on measurement time $s$ are calculated using the Breslow estimator.

Smoothing marker trajectories with BLUPs.

Before fitting the PC model, partlyconditional functions include procedures to smooth a marker $Y_i$'s univariate trajectory through time by fitting mixed effect models. This is helpful for improving model performance when markers are measured with error. For each marker $Y$, we model the biomarker process using a linear mixed effect model of the form:

$$ Y_{ij} = \beta_0 + \beta_1 M_{j} + u_{0i} + u_{1i}M_j + \varepsilon_{ij} $$ for measurement $j$ recorded from subject $i$ at measurement time $M_j$. The measurement error, $\varepsilon_{ij}$, is assumed to have a normal distribution. As shown above, we model marker trajectories using fixed intercept and linear effect of measurement time ($\beta_0$ and $\beta_1$) along with random intercepts and slopes that vary across individuals ($u_{0i}$ and $u_{1_i}$).

To obtain smoothed marker values to use for a new prediction, we use these mixed effect models to estimate the best linear unbiased predictors (BLUPs) $\hat{h}(Y)$ using the known marker values recorded up to some time $s$.

Please see references below for further details.

Tutorial

Load package

Package can be downloaded directly from Github using the devtools package.

library(devtools)
###install 
devtools::install_github("mdbrown/partlyconditional")

All package code is also available here.

#load libraries
library(partlyconditional)
library(tidyverse)

Simulated data

For this tutorial, we use data on 478 simulated observations from 100 hypothetical individuals with repeated marker measurements. 'marker_1' was simulated to be associated with the outcome status, while 'marker_2' is pure noise.

data(pc_data)

head(pc_data)

Note that pc_data is in 'long' format, with one row per measurement time. Each individual has a unique numeric subject id (sub.id) where event time (time) and event status (status) are repeated across marker measurement times (meas.time) given in months. log.meas.time is the transformation $log(s + 1)$ applied to the measurement time that we use for modeling.

Fit a partly conditional Cox model

We fit the model using the transformed log.meas.time and two markers. Raw marker values are used in the model as predictors since use.BLUP is set to FALSE for both markers.

pc.model.1 <-  PC.Cox(
        id = "sub.id",
        stime = "time",
        status = "status",
        measurement.time = "log.meas.time",
        markers = c("marker_1", "marker_2"),
        data = pc_data,
        use.BLUP = c(FALSE, FALSE), #no modeling of markers through time
        knots.measurement.time = NA) #no spline used 

pc.model.1

pc.model.1$model.fit #direct access to the coxph model object

We can access the objects in the model fit by using the $ operator, since PC.Cox returns a list. See names(pc.model.1) to see all information recorded in the model fit.

Calculate BLUPs

Instead of using raw marker values as predictors, which may have been measured with error, we first smooth marker measurements using mixed effect models univariately on each marker. We specify that BLUPs should be calculated for each marker by setting use.BLUP = c(TRUE, TRUE). For this model fit, we also set knots.measurement.time = 3 to model measurement time using natural cubic splines.

For each marker with use.BLUP element equal to TRUE, we model the marker as function of measurement time using:

lme(marker ~ 1 + measurement.time, random = ~ 1 + measurement.time | id)

and estimate best linear unbiased predictors BLUPs for each marker using this set of models.

pc.model.2 <-  PC.Cox(
        id = "sub.id",
        stime = "time",
        status = "status",
        measurement.time = "meas.time",
        markers = c("marker_1", "marker_2"),
        data = pc_data,
        use.BLUP = c(TRUE, TRUE), #smooth marker trajectories 
        knots.measurement.time = 3) # model measurement time using splines 

pc.model.2

#direct access to mixed effect model fits
pc.model.2$marker.blup.fit[[1]] #same for marker_2

Make predictions

We can now use the model fits above and predict the risk at fixed prediction times conditional on marker history. We first select some patients, for whom which we would like to calculate risk conditional on up to 18 months of marker data.

# choose to make predictions for subject id 3, 9 and 74
#using marker measurements up to month 18
newd <- dplyr::filter(pc_data, is.element(sub.id, c( 3, 9, 74)), meas.time <= 18)
newd

Next, we use predict to estimate 12 and 24 month risk conditional on last marker time measured.

myp.1 <- predict(pc.model.1 , 
                 newdata  = newd, 
                 prediction.time = c(12, 24))
#estimate risk conditional on last recorded measurement time
#for each individual
myp.1

predict produces a data.frame consisting of marker values and measurement times for the most recent marker measurement observed for each individual. Risk of experiencing the event of interested within 12 and 24 months is estimated for each individual conditional on surviving to the most recent marker measurement recorded for that individual. This means that subject 3 has an estimated 12 month risk of ~9% conditional on surviving 18 months from baseline, whereas subject 74 has a 12 month risk of ~8% conditional on surviving 12 months from baseline.

If we make predictions from a model that includes BLUPs to smooth markers and/or splines to model measurement time, these transformations are included in the output.

myp.2 <- predict(pc.model.2 , 
                 newdata  = newd, 
                 prediction.time = c(12, 24))
#also includes information on measurement time splines and 
#marker blups 
myp.2

Plot risk trajectory

Below we display some code that displays trajectory of risk for subject 9 estimated using both models.

newd <- filter(pc_data, sub.id ==9, meas.time <=18)
newd 

#predict 1-12 month risk after 
myp.traj.1 <- predict(pc.model.1 , newdata  = newd, prediction.time = c(1:12))
myp.traj.2 <- predict(pc.model.2 , newdata  = newd, prediction.time = c(1:12))

myp.traj.1$model = "PC Cox model 1"
myp.traj.2$model = "PC Cox model 2"

myp.traj <- bind_rows(myp.traj.1, myp.traj.2)

myp.traj %>% gather( "time_risk", "risk", risk_1:risk_12) %>% 
   select(time_risk, risk, model) %>% 
   transform(month = as.numeric(gsub("[^0-9]","", time_risk))) %>%
  ggplot(aes(x = month, y = risk, color = model)) + geom_point() + geom_path()

References {#ref}

Zheng YZ, Heagerty PJ. Partly conditional survival models for longitudinal data. Biometrics. 2005;61:379–391.

Maziarz, M., Heagerty, P., Cai, T. and Zheng, Y. (2017), On longitudinal prediction with time-to-event outcome: Comparison of modeling options. Biom, 73: 83–93. doi:10.1111/biom.12562

mdbrown/partlyconditional documentation built on May 22, 2019, 12:38 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com