pec: Prediction error curves
In pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis

View source: R/pec.R

pec	R Documentation

Prediction error curves

Description

Evaluating the performance of risk prediction models in survival analysis. The Brier score is a weighted average of the squared distances between the observed survival status and the predicted survival probability of a model. Roughly the weights correspond to the probabilities of not being censored. The weights can be estimated depend on covariates. Prediction error curves are obtained when the Brier score is followed over time. Cross-validation based on bootstrap resampling or bootstrap subsampling can be applied to assess and compare the predictive power of various regression modelling strategies on the same set of data.

Usage

pec(
  object,
  formula,
  data,
  traindata,
  times,
  cause,
  start,
  maxtime,
  exact = TRUE,
  exactness = 100,
  fillChar = NA,
  cens.model = "cox",
  ipcw.refit = FALSE,
  ipcw.args = NULL,
  splitMethod = "none",
  B,
  M,
  reference = TRUE,
  model.args = NULL,
  model.parms = NULL,
  keep.index = FALSE,
  keep.matrix = FALSE,
  keep.models = FALSE,
  keep.residuals = FALSE,
  keep.pvalues = FALSE,
  noinf.permute = FALSE,
  multiSplitTest = FALSE,
  testIBS,
  testTimes,
  confInt = FALSE,
  confLevel = 0.95,
  verbose = TRUE,
  savePath = NULL,
  slaveseed = NULL,
  na.action = na.fail,
  ...
)

Arguments

`object`	A named list of prediction models, where allowed entries are (1) R-objects for which a predictSurvProb method exists (see details), (2) a `call` that evaluates to such an R-object (see examples), (3) a matrix with predicted probabilities having as many rows as `data` and as many columns as `times`. For cross-validation all objects in this list must include their `call`.
`formula`	A survival formula as obtained either with `prodlim::Hist` or `survival::Surv`. The left hand side is used to find the status response variable in `data`. For right censored data, the right hand side of the formula is used to specify conditional censoring models. For example, set `Surv(time,status)~x1+x2` and `cens.model="cox"`. Then the weights are based on a Cox regression model for the censoring times with predictors x1 and x2. Note that the usual coding is assumed: `status=0` for censored times and that each variable name that appears in `formula` must be the column name in `data`. If there are no covariates, i.e. `formula=Surv(time,status)~1` the `cens.model` is coerced to `"marginal"` and the Kaplan-Meier estimator for the censoring times is used to calculate the weights. If `formula` is `missing`, try to extract a formula from the first element in object.
`data`	A data frame in which to validate the prediction models and to fit the censoring model. If `data` is missing, try to extract a data set from the first element in object.
`traindata`	A data frame in which the models are trained. This argument is used only in the absence of crossvalidation, in which case it is passed to the predictHandler function predictSurvProb
`times`	A vector of time points. At each time point the prediction error curves are estimated. If `exact==TRUE` the `times` are merged with all the unique values of the response variable. If `times` is missing and `exact==TRUE` all the unique values of the response variable are used. If missing and `exact==FALSE` use a equidistant grid of values between `start` and `maxtime`. The distance is determined by `exactness`.
`cause`	For competing risks, the event of interest. Defaults to the first state of the response, which is obtained by evaluating the left hand side of `formula` in `data`.
`start`	Minimal time for estimating the prediction error curves. If missing and `formula` defines a `Surv` or `Hist` object then `start` defaults to `0`, otherwise to the smallest observed value of the response variable. `start` is ignored if `times` are given.
`maxtime`	Maximal time for estimating the prediction error curves. If missing the largest value of the response variable is used.
`exact`	Logical. If `TRUE` estimate the prediction error curves at all the unique values of the response variable. If `times` are given and `exact=TRUE` then the `times` are merged with the unique values of the response variable.
`exactness`	An integer that determines how many equidistant gridpoints are used between `start` and `maxtime`. The default is 100.
`fillChar`	Symbol used to fill-in places where the values of the prediction error curves are not available. The default is `NA`.
`cens.model`	Method for estimating inverse probability of censoring weigths: `cox`: A semi-parametric Cox proportional hazard model is fitted to the censoring times `marginal`: The Kaplan-Meier estimator for the censoring times `nonpar`: Nonparametric extension of the Kaplan-Meier for the censoring times using symmetric nearest neighborhoods – available for arbitrary many strata variables on the right hand side of argument `formula` but at most one continuous variable. See the documentation of the functions `prodlim` and `neighborhood` from the prodlim package. `aalen`: The nonparametric Aalen additive model fitted to the censoring times. Requires the `timereg` package.
`ipcw.refit`	If `TRUE` the inverse probability of censoring weigths are estimated separately in each training set during cross-validation.
`ipcw.args`	List of arguments passed to function specified by argument `cens.model`.
`splitMethod`	SplitMethod for estimating the prediction error curves. `none/noPlan`: Assess the models in the same data where they are fitted. `boot`: DEPRECIATED. `cvK`: K-fold cross-validation, i.e. `cv10` for 10-fold cross-validation. After splitting the data in K subsets, the prediction models (ie those specified in `object`) are evaluated on the data omitting the Kth subset (training step). The prediction error is estimated with the Kth subset (validation step). The random splitting is repeated `B` times and the estimated prediction error curves are obtained by averaging. `BootCv`: Bootstrap cross validation. The prediction models are trained on `B` bootstrap samples, that are either drawn with replacement of the same size as the original data or without replacement from `data` of the size `M`. The models are assessed in the observations that are NOT in the bootstrap sample. `Boot632`: Linear combination of AppErr and BootCvErr using the constant weight .632. `Boot632plus`: Linear combination of AppErr and BootCv using weights dependent on how the models perform in permuted data. `loocv`: Leave one out cross-validation. `NoInf`: Assess the models in permuted data.
`B`	Number of bootstrap samples. The default depends on argument `splitMethod`. When `splitMethod` in c("BootCv","Boot632","Boot632plus") the default is 100. For `splitMethod="cvK"` `B` is the number of cross-validation cycles, and – default is 1. For `splitMethod="none"` `B` is the number of bootstrap simulations e.g. to obtain bootstrap confidence limits – default is 0.
`M`	The size of the bootstrap samples for resampling without replacement. Ignored for resampling with replacement.
`reference`	Logical. If `TRUE` add the marginal Kaplan-Meier prediction model as a reference to the list of models.
`model.args`	List of extra arguments that can be passed to the `predictSurvProb` methods. The list must have an entry for each entry in `object`.
`model.parms`	Experimental. List of with exactly one entry for each entry in `object`. Each entry names parts of the value of the fitted models that should be extracted and added to the value.
`keep.index`	Logical. If `FALSE` remove the bootstrap or cross-validation index from the output list which otherwise is included in the splitMethod part of the output list.
`keep.matrix`	Logical. If `TRUE` add all `B` prediction error curves from bootstrapping or cross-validation to the output.
`keep.models`	Logical. If `TRUE` keep the models in object. Since fitted models can be large objects the default is `FALSE`.
`keep.residuals`	Logical. If `TRUE` keep the patient individual residuals at `testTimes`.
`keep.pvalues`	For `multiSplitTest`. If `TRUE` keep the pvalues from the single splits.
`noinf.permute`	If `TRUE` the noinformation error is approximated using permutation.
`multiSplitTest`	If `TRUE` the test proposed by van de Wiel et al. (2009) is applied. Requires subsampling bootstrap cross-validation, i.e. that `splitMethod` equals `bootcv` and that `M` is specified.
`testIBS`	A range of time points for testing differences between models in the integrated Brier scores.
`testTimes`	A vector of time points for testing differences between models in the time-point specific Brier scores.
`confInt`	Experimental.
`confLevel`	Experimental.
`verbose`	if `TRUE` report details of the progress, e.g. count the steps in cross-validation.
`savePath`	Place in your file system (i.e., a directory on your computer) where training models fitted during cross-validation are saved. If `missing` training models are not saved.
`slaveseed`	Vector of seeds, as long as `B`, to be given to the slaves in parallel computing.
`na.action`	Passed immediately to model.frame. Defaults to na.fail. If set otherwise most prediction models will not work.
`...`	Not used.

Details

Note that package riskRegression provides very similar functionality (and much more) but not yet every feature of pec.

Missing data in the response or in the input matrix cause a failure.

The status of the continuous response variable at cutpoints (times), ie status=1 if the response value exceeds the cutpoint and status=0 otherwise, is compared to predicted event status probabilities which are provided by the prediction models on the basis of covariates. The comparison is done with the Brier score: the quadratic difference between 0-1 response status and predicted probability.

There are two different sources for bias when estimating prediction error in right censored survival problems: censoring and high flexibility of the prediction model. The first is controlled by inverse probability of censoring weighting, the second can be controlled by special Monte Carlo simulation. In each step, the resampling procedures reevaluate the prediction model. Technically this is done by replacing the argument object$call$data by the current subset or bootstrap sample of the full data.

For each prediction model there must be a predictSurvProb method: for example, to assess a prediction model which evaluates to a myclass object one defines a function called predictSurvProb.myclass with arguments object,newdata,cutpoints,...

Such a function takes the object and derives a matrix with predicted event status probabilities for each subject in newdata (rows) and each cutpoint (column) of the response variable that defines an event status.

Currently, predictSurvProb methods are readily available for various survival models, see methods(predictSurvProb)

Value

A pec object. See also the help pages of the corresponding print, summary, and plot methods. The object includes the following components:

`PredErr`	The estimated prediction error according to the `splitMethod`. A matrix where each column represents the estimated prediction error of a fit at the time points in time.
`AppErr`	The training error or apparent error obtained when the model(s) are evaluated in the same data where they were trained. Only if `splitMethod` is one of "NoInf", "cvK", "BootCv", "Boot632" or "Boot632plus".
`BootCvErr`	The prediction error when the model(s) are trained in the bootstrap sample and evaluated in the data that are not in the bootstrap sample. Only if `splitMethod` is one of "Boot632" or "Boot632plus". When `splitMethod="BootCv"` then the `BootCvErr` is stored in the component `PredErr`.
`NoInfErr`	The prediction error when the model(s) are evaluated in the permuted data. Only if `splitMethod` is one of "BootCv", "Boot632", or "Boot632plus". For `splitMethod="NoInf"` the `NoInfErr` is stored in the component `PredErr`.
`weight`	The weight used to linear combine the `AppErr` and the `BootCvErr` Only if `splitMethod` is one of "Boot632", or "Boot632plus".
`overfit`	Estimated `overfit` of the model(s). See Efron and Tibshirani (1997, Journal of the American Statistical Association) and Gerds and Schumacher (2007, Biometrics). Only if `splitMethod` is one of "Boot632", or "Boot632plus".
`call`	The call that produced the object
`time`	The time points at which the prediction error curves change.
`ipcw.fit`	The fitted censoring model that was used for re-weighting the Brier score residuals. See Gerds and Schumacher (2006, Biometrical Journal)
`n.risk`	The number of subjects at risk for all time points.
`models`	The prediction models fitted in their own data.
`cens.model`	The censoring models.
`maxtime`	The latest timepoint where the prediction error curves are estimated.
`start`	The earliest timepoint where the prediction error curves are estimated.
`exact`	`TRUE` if the prediction error curves are estimated at all unique values of the response in the full data.
`splitMethod`	The splitMethod used for estimation of the overfitting bias.

Author(s)

Thomas Alexander Gerds tag@biostat.ku.dk

References

Gerds TA, Kattan MW. Medical Risk Prediction Models: With Ties to Machine Learning. Chapman and Hall/CRC https://www.routledge.com/9781138384477

Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11

E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.

Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548–560 Improvement On Cross-Validation: The .632+ Bootstrap Method.

Gerds, Schumacher (2006), Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, vol 48, 1029–1040.

Thomas A. Gerds, Martin Schumacher (2007) Efron-Type Measures of Prediction Error for Survival Analysis Biometrics, 63(4), 1283–1287 doi:10.1111/j.1541-0420.2007.00832.x

Martin Schumacher, Harald Binder, and Thomas Gerds. Assessment of survival prediction models based on microarray data. Bioinformatics, 23(14):1768-74, 2007.

Mark A. van de Wiel, Johannes Berkhof, and Wessel N. van Wieringen Testing the prediction error difference between 2 predictors Biostatistics (2009) 10(3): 550-560 doi:10.1093/biostatistics/kxp011

Examples


# simulate an artificial data frame
# with survival response and two predictors

set.seed(130971)
library(prodlim)
library(survival)
dat <- SimSurv(100)

# fit some candidate Cox models and compute the Kaplan-Meier estimate 

Models <- list("Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE),
              "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE),
              "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE))

# compute the apparent prediction error 
PredError <- pec(object=Models,
                  formula=Surv(time,status)~X1+X2,
                  data=dat,
                  exact=TRUE,
                  cens.model="marginal",
                  splitMethod="none",
                  B=0,
                  verbose=TRUE)

print(PredError,times=seq(5,30,5))
summary(PredError)
plot(PredError,xlim=c(0,30))

# Comparison of Weibull model and Cox model
library(survival)
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)

# compute the .632+ estimate of the generalization error
set.seed(130971)
library(prodlim)
library(survival)
dat <- SimSurv(100)
set.seed(17100)
PredError.632plus <- pec(object=Models,
                  formula=Surv(time,status)~X1+X2,
                  data=dat,
                  exact=TRUE,
                  cens.model="marginal",
                  splitMethod="Boot632plus",
                  B=3,
                  verbose=TRUE)

print(PredError.632plus,times=seq(4,12,4))
summary(PredError.632plus)
plot(PredError.632plus,xlim=c(0,30))
# do the same again but now in parallel
## Not run: set.seed(17100)
# library(doMC)
# registerDoMC()
PredError.632plus <- pec(object=Models,
                  formula=Surv(time,status)~X1+X2,
                  data=dat,
                  exact=TRUE,
                  cens.model="marginal",
                  splitMethod="Boot632plus",
                  B=3,
                  verbose=TRUE)

## End(Not run)
# assessing parametric survival models in learn/validation setting
learndat <- SimSurv(50)
testdat <- SimSurv(30)
library(survival)
library(rms)
f1 <- psm(Surv(time,status)~X1+X2,data=learndat)
f2 <- psm(Surv(time,status)~X1,data=learndat)
pf <- pec(list(f1,f2),formula=Surv(time,status)~1,data=testdat,maxtime=200)
plot(pf)
summary(pf)

# ---------------- competing risks -----------------

library(survival)
library(riskRegression)
if(requireNamespace("cmprsk",quietly=TRUE)){
library(cmprsk)
library(pec)
cdat <- SimCompRisk(100)
f1  <- CSC(Hist(time,event)~X1+X2,cause=2,data=cdat)
f2  <- CSC(Hist(time,event)~X1,data=cdat,cause=2)
f3  <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat)
f4  <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat)
p1 <- pec(list(f1,f2,f3,f4),formula=Hist(time,event)~1,data=cdat,cause=2)
}

pec documentation built on April 11, 2023, 5:55 p.m.

pec index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pec
Prediction Error Curves for Risk Prediction Models in Survival Analysis

pec: Prediction error curves
In pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis

Prediction error curves

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to pec in pec...

R Package Documentation

Browse R Packages

We want your feedback!

pec Prediction Error Curves for Risk Prediction Models in Survival Analysis

pec: Prediction error curves In pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis

Prediction error curves

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to pec in pec...

R Package Documentation

Browse R Packages

We want your feedback!

pec
Prediction Error Curves for Risk Prediction Models in Survival Analysis

pec: Prediction error curves
In pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis