prediction: Prediction probabilities for Cox proportional hazard, Shared,... In frailtypack: Shared, Joint (Generalized) Frailty Models; Surrogate Endpoints

Description

For Cox proportional hazard model

A predictive probability of event between t and horizon time t+w, with w the window of prediction. For Gamma Shared Frailty model for clustered (not recurrent) events

Two kinds of predictive probabilities can be calculated:

- a conditional predictive probability of event between t and horizon time t+w, i.e. given a specific group - a marginal predictive probability of event between t and horizon time t+w, i.e. averaged over the population For Gaussian Shared Frailty model for clustered (not recurrent) events

Two kinds of predictive probabilities can be calculated:

- a conditional predictive probability of event between t and horizon time t+w, i.e. given a specific group and given a specific Gaussian random effect η - a marginal predictive probability of event between t and horizon time t+w, i.e. averaged over the population For Gamma Shared Frailty model for recurrent events

Two kinds of predictive probabilities can be calculated:

- A marginal predictive probability of event between t and horizon time t+w, i.e. averaged over the population. - a conditional predictive probability of event between t and horizon time t+w, i.e. given a specific individual.

This prediction method is the same as the conditional gamma prediction method applied for clustered events (see formula Pcond before).

For Gaussian Shared Frailty model for recurrent events

Two kinds of predictive probabilities can be calculated:

- A marginal predictive probability of event between t and horizon time t+w, i.e. averaged over the population. - a conditional predictive probability of event between t and horizon time t+w, i.e. given a specific individual.

This prediction method is the same as the conditional Gaussian prediction method applied for clustered events (see formula Pcond before).

It is possible to compute all these predictions in two ways on a scale of times : - either you want a cumulative probability of developing the event between t and t+w (with t fixed, but with a varying window of prediction w); - either you want at a specific time the probability to develop the event in the next w (ie, for a varying prediction time t, but for a fixed window of prediction). See Details.

For Joint Frailty model

Prediction for two types of event can be calculated : for a terminal event or for a new recurrent event, knowing patient's characteristics.

- Prediction of death knowing patients' characteristics :

It is to predict the probability of death in a specific time window given the history of patient i before the time of prediction t. The history HiJ,l, (l=1,2) is the information on covariates before time t, but also the number of recurrences and the time of occurences. Three types of marginal probabilities are computed:

- a prediction of death between t and t+w given that the patient had exactly J recurrences (HiJ,1) before t - a prediction of death between t and t+w given that the patient had at least J recurrences (HiJ,2) before t - a prediction of death between t and t+w considering the recurrence history only in the parameters estimation. It corresponds to the average probability of death between t and t+w for a patient with these given characteristics. - Prediction of risk of a new recurrent event knowing patients' characteristics :

It is to predict the probability of a new recurrent event in a specific time window given the history of patient i before the time of prediction t. The history HiJ is the information on covariates before time t, but also the number of recurrences and the time of occurences. The marginal probability computed is a prediction of a new recurrent event between t and t+w given that the patient had exactly J recurrences (HiJ) before t:  It is possible to compute all these predictions in two ways : - either you want a cumulative probability of developing the event between t and t+w (with t fixed, but with a varying window of prediction w); - either you want at a specific time the probability to develop the event in the next w (ie, for a varying prediction time t, but for a fixed window of prediction). See Details.

With Gaussian frailties (η), the same expressions are used but with uiJ replaced by exp(J\eqnetai) and g(η) corresponds to the Gaussian distribution.

For Joint Nested Frailty models

Prediction of the probability of developing a terminal event between t and t+w for subject i who survived by time t based on the visiting and disease histories of their own and other family members observed by time t.

Let (YfiR(t)) be the history of subject i in family f, before time t, which includes all the recurrent events and covariate information. For disease history, let TfiD(t) = min(Tfi,t) be the observed time to an event before t ; δfiD(t) the disease indicator by time t and XfiD(t) the covariate information observed up to time t. We define the family history of subject i in family f by which includes the visiting and disease history of all subjects except for subject i in family f as well as their covariate information by time t.

The prediction probability can be written as :  For Joint models for longitudinal data and a terminal event

The predicted probabilities are calculated in a specific time window given the history of biomarker measurements before the time of prediction t (ƴi(t)). The probabilities are conditional also on covariates before time t and that the subject was at risk at t. The marginal predicted probability of the terminal event is These probabilities can be calculated in several time points with fixed time of prediction t and varying window w or with fixed window w and varying time of prediction t. See Details for an example of how to construct time windows.

For Trivariate joint models for longitudinal data, recurrent events and a terminal event

The predicted probabilities are calculated in a specific time window given the history of biomarker measurements ƴi(t) and recurrences HiJ,1 (complete history of recurrences with known J number of observed events) before the time of prediction t. The probabilities are conditional also on covariates before time t and that the subject was at risk at t. The marginal predicted probability of the terminal event is The biomarker history can be represented using a linear (trivPenal) or non-linear mixed-effects model (trivPenalNL).

These probabilities can be calculated in several time points with fixed time of prediction t and varying window w or with fixed window w and varying time of prediction t. See Details for an example of how to construct time windows.

Usage

 1 2 prediction(fit, data, data.Longi, t, window, event="Both", conditional = FALSE, MC.sample=0, individual)

Arguments

 fit A frailtyPenal, jointPenal, longiPenal, trivPenal or trivPenalNL object. data Data frame for the prediction. See Details. data.Longi Data frame for the prediction used for joint models with longitudinal data. See Details. t Time or vector of times for prediction. window Window or vector of windows for prediction. event Only for joint and shared models. The type of event you want to predict : "Terminal" for a terminal event, "Recurrent" for a recurrent event or "Both". Default value is "Both". For joint nested model, only 'Terminal' is allowed. In a shared model, if you want to predict a new recurrent event then the argument "Recurrent" should be use. If you want to predict a new event from clustered data, do not use this option. conditional Only for prediction method applied on shared models. Provides distinction between the conditional and marginal prediction methods. Default is FALSE. MC.sample Number of samples used to calculate confidence bands with a Monte-Carlo method (with a maximum of 1000 samples). If MC.sample=0 (default value), no confidence intervals are calculated. individual Only for joint nested model. Vector of individuals (of the same family) you want to make prediction.

Details

To compute predictions with a prediction time t fixed and a variable window:

 1 prediction(fit, datapred, t=10, window=seq(1,10,by=1))

Otherwise, you can have a variable prediction time and a fixed window.

 1 prediction(fit, datapred, t=seq(10,20,by=1), window=5)

Or fix both prediction time t and window.

 1 2 prediction(fit, datapred, t=10, window=5)

The data frame building is an important step. It will contain profiles of patient on which you want to do predictions. To make predictions on a Cox proportional hazard or a shared frailty model, only covariates need to be included. You have to distinguish between numerical and categorical variables (factors). If we fit a shared frailty model with two covariates sex (factor) and age (numeric), here is the associated data frame for three profiles of prediction.

 1 2 3 4 datapred <- data.frame(sex=0,age=0) datapred\$sex <- as.factor(datapred\$sex) levels(datapred\$sex)<- c(1,2) datapred[1,] <- c(1,40) # man, 40 years old datapred[2,] <- c(2,45) # woman, 45 years old datapred[3,] <- c(1,60) # man, 60 years old

Time-dependent covariates: In the context of time-dependent covariate, the last previous value of the covariate is used before the time t of prediction.

It should be noted, that in a data frame for both marginal and conditional prediction on a shared frailty model for clustered data, the group must be specified. In the case of marginal predictions this can be any number as it does not influence predictions. However, for conditional predictions, the group must be also included in the data set used for the model fitting. The conditional predictions apply the empirical Bayes estimate of the frailty from the specified cluster. Here, three individuals belong to group 5.

 1 2 3 4 5 datapred <- data.frame(group=0, sex=0,age=0) datapred\$sex <- as.factor(datapred\$sex) levels(datapred\$sex)<- c(1,2) datapred[1,] <- c(5,1,40) # man, 40 years old (cluster 5) datapred[2,] <- c(5,2,45) # woman, 45 years old (cluster 5) datapred[3,] <- c(5,1,60) # man, 60 years old (cluster 5)

To use the prediction function on joint frailty models and trivariate joint models, the construction will be a little bit different. In these cases, the prediction for the terminal event takes into account covariates but also history of recurrent event times for a patient. You have to create a data frame with the relapse times, the indicator of event, the cluster variable and the covariates. Relapses occurring after the prediction time may be included but will be ignored for the prediction. A joint model with calendar-timescale need to be fitted with Surv(start,stop,event), relapse times correspond to the "stop" variable and indicators of event correspond to the "event" variable (if event=0, the relapse will not be taken into account). For patients without relapses, all the values of "event" variable should be set to 0. Finally, the same cluster variable name needs to be in the joint model and in the data frame for predictions ("id" in the following example). For instance, we observe relapses of a disease and fit a joint model adjusted for two covariates sex (1:male 2:female) and chemo (treatment by chemotherapy 1:no 2:yes). We describe 3 different profiles of prediction all treated by chemotherapy: 1) a man with four relapses at 100, 200, 300 and 400 days, 2) a man with only one relapse at 1000 days, 3) a woman without relapse.

 1 2 3 4 5 6 7 8 9 datapred <- data.frame(time=0,event=0,id=0,sex=0,chemo=0) datapred\$sex <- as.factor(datapred\$sex) levels(datapred\$sex) <- c(1,2) datapred\$chemo <- as.factor(datapred\$chemo) levels(datapred\$chemo) <- c(1,2) datapred[1,] <- c(100,1,1,1,2) # first relapse of the patient 1 datapred[2,] <- c(200,1,1,1,2) # second relapse of the patient 1 datapred[3,] <- c(300,1,1,1,2) # third relapse of the patient 1 datapred[4,] <- c(400,1,1,1,2) # fourth relapse of the patient 1 datapred[5,] <- c(1000,1,2,1,2) # one relapse at 1000 days for patient 2 datapred[6,] <- c(100,0,3,2,2) # patient 3 did not relapse

The data can also be the dataset used to fit the joint model. In this case, you will obtain as many prediction rows as patients.

Finally, for the predictions using joint models for longitudinal data and a terminal event and trivariate joint models, a data frame with the history of the biomarker measurements must be provided. It must include data on measurements (values and time points), cluster variable and covariates. Measurements taken after the prediction time may be included but will be ignored for the prediction. The same cluster variable name must be in the data frame, in the data frame used for the joint model and in the data frame with the recurrent event and terminal event times. For instance, we observe two patients and each one had 5 tumor size measurements (patient 1 had an increasing tumor size and patient 2, decreasing). The joint model used for the predictions was adjusted on sex (1: male, 2: female), treatment (1: sequential arm, 2: combined arm), WHO baseline performance status (1: 0 status, 2: 1 status, 3: 2 status) and previous resection of the primate tumor (0: no, 1: yes). The data frame for the biomarker measurements can be:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 datapredj_longi <- data.frame(id = 0, year = 0, tumor.size = 0, treatment = 0, age = 0, who.PS = 0, prev.resection = 0) datapredj_longi\$treatment <- as.factor(datapredj_longi\$treatment) levels(datapredj_longi\$treatment) <- 1:2 datapredj_longi\$age <- as.factor(datapredj_longi\$age) levels(datapredj_longi\$age) <- 1:3 datapredj_longi\$who.PS <- as.factor(datapredj_longi\$who.PS) levels(datapredj_longi\$who.PS) <- 1:3 datapredj_longi\$prev.resection <- as.factor (datapredj_longi\$prev.resection) levels(datapredj_longi\$prev.resection) <- 1:2 # patient 1: increasing tumor size datapredj_longi[1,] <- c(1, 0,1.2 ,2,1,1,1) datapredj_longi[2,] <- c(1,0.3,1.4,2,1,1,1) datapredj_longi[3,] <- c(1,0.6,1.9,2,1,1,1) datapredj_longi[4,] <- c(1,0.9,2.5,2,1,1,1) datapredj_longi[5,] <- c(1,1.5,3.9,2,1,1,1) # patient 2: decreasing tumor size datapredj_longi[6,] <- c(2, 0,1.2 ,2,1,1,1) datapredj_longi[7,] <- c(2,0.3,0.7,2,1,1,1) datapredj_longi[8,] <- c(2,0.5,0.3,2,1,1,1) datapredj_longi[9,] <- c(2,0.7,0.1,2,1,1,1) datapredj_longi[10,] <- c(2,0.9,0.1,2,1,1,1)

Value

The following components are included in a 'predFrailty' object obtained by using prediction function for Cox proportional hazard and shared frailty model.

 npred Number of individual predictions x.time A vector of prediction times of interest (used for plotting predictions): vector of prediction times t if fixed window. Otherwise vector of prediction times t+w window Prediction window or vector of prediction windows pred Predictions estimated for each profile icproba Logical value. Were confidence intervals estimated ? predLow Lower limit of Monte-Carlo confidence interval for each prediction predHigh Upper limit of Monte-Carlo confidence interval for each prediction type Type of prediction probability (marginal or conditional) group For conditional probability, the list of group on which you make predictions

The following components are included in a 'predJoint' object obtained by using prediction function for joint frailty model.

 npred Number of individual predictions x.time A vector of prediction times of interest (used for plotting predictions): vector of prediction times t if fixed window. Otherwise vector of prediction times t+w window Prediction window or vector of prediction windows group Id of each patient pred1 Estimation of probability of type 1: exactly j recurrences pred2 Estimation of probability of type 2: at least j recurrences pred3 Estimation of probability of type 3 pred1_rec Estimation of prediction of relapse icproba Logical value. Were confidence intervals estimated ? predlow1 Lower limit of Monte-Carlo confidence interval for probability of type 1 predhigh1 Upper limit of Monte-Carlo confidence interval for probability of type 1 predlow2 Lower limit of Monte-Carlo confidence interval for probability of type 2 predhigh2 Upper limit of Monte-Carlo confidence interval for probability of type 2 predlow3 Lower limit of Monte-Carlo confidence interval for probability of type 3 predhigh3 Upper limit of Monte-Carlo confidence interval for probability of type 3 predhigh1_rec Upper limit of Monte-Carlo confidence interval for prediction of relapse predlow1_rec Lower limit of Monte-Carlo confidence interval for prediction of relapse

The following components are included in a 'predLongi' object obtained by using prediction function for joint models with longitudinal data.

 npred Number of individual predictions x.time A vector of prediction times of interest (used for plotting predictions): vector of prediction times t if fixed window. Otherwise vector of prediction times t+w window Prediction window or vector of prediction windows group Id of each patient pred Estimation of probability icproba Logical value. Were confidence intervals estimated? predLow Lower limit of Monte-Carlo confidence intervals predHigh Upper limit of Monte-Carlo confidence intervals trivariate Logical value. Are the prediction calculated from the trivariate model?

References

A. Krol, L. Ferrer, JP. Pignon, C. Proust-Lima, M. Ducreux, O. Bouche, S. Michiels, V. Rondeau (2016). Joint Model for Left-Censored Longitudinal Data, Recurrent Events and Terminal Event: Predictive Abilities of Tumor Burden for Cancer Evolution with Application to the FFCD 2000-05 Trial. Biometrics 72(3) 907-16.

A. Mauguen, B. Rachet, S. Mathoulin-Pelissier, G. MacGrogan, A. Laurent, V. Rondeau (2013). Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models. Statistics in Medicine, 32(30), 5366-80.

V. Rondeau, A. Laurent, A. Mauguen, P. Joly, C. Helmer (2015). Dynamic prediction models for clustered and interval-censored outcomes: investigating the intra-couple correlation in the risk of dementia. Statistical Methods in Medical Research