predRes: Evaluation of the prediction accuracy of a prediction model
In Oncostat/biospear: Biomarker selection in penalized regression models

Description Usage Arguments Details Value Author(s) References Examples

View source: R/predRes.R

This function computes several criteria to assess the prediction accuracy of a prediction model.

predRes(res, method, traindata, newdata, int.cv, int.cv.nfold = 5, time,
  trace = TRUE, ncores = 1)

## S3 method for class 'predRes'
plot(x, method, crit = c("C", "PE", "dC"),
  xlim, ylim, xlab, ylab, col,...)

`res`	an object of class '`resBMsel`' generated by `BMsel`.
`method`	methods for which prediction criteria are computed. If missing, all methods contained in `res` are computed.
`traindata`	input `data.frame` used to compute the `res` object. This object is mandatory.
`newdata`	input `data.frame` not used to compute the `res` object. This object is not mandatory (see Details section).
`int.cv`	logical parameter indicating if a double cross-validation process (2CV) should be performed to mimick an external validation set.
`int.cv.nfold`	number of folds for the double cross-validation. Considering a large value for `int.cv.nfold` should provide extremely large computation time. `int.cv.nfold` must not be considered when `int.cv = FALSE`.
`time`	time points to compute the prediction criteria.
`trace`	logical parameter indicating if messages should be printed.
`ncores`	number of CPUs used (for the double cross-validation).
`x`	an object of class '`predRes`' generated from `predRes`.
`crit`	parameter indicating the criterion for which the results will be printed (`C`: concordance via Uno's C-statistic, `PE`: prediction error via integrated Brier score and `dC`: delta Uno's C-statistic (for the interaction setting only)).
`xlim, ylim, xlab, ylab, col`	usual parameters for plot.
`...`	other paramaters for plot.

To evaluate the accuracy of the selected models, three predictive accuracy measures are implemented:
- the integrated Brier score (PE) to measure the overall prediction error of the prediction model. The time-dependent Brier score is a quadratic score based on the predicted time-dependent survival probability.
- the Uno's C-statistic (C) to evaluate the discrimination of the prediction model. It's one of the least biased concordance statistic estimator in the presence of censoring (Uno et al., 2011).
- the absolute difference of the treatment-specific Uno's C-statistics (dC) to evaluate the interaction strength of the prediction model (Ternes et al., 2016).
For simulated datasets, the predictive accuracy metrics are also computed for the "oracle model" that is the unpenalized Cox proportional hazards model fitted to the active biomarkers only.

A list of the same length of the time considered. Each element of the list contains between 1 and 3 sublists depending on the chosen validation (i.e. training set [always computed], internal validation through double cross-validation (2CV) [if int.cv = TRUE] and/or external validation [if newdata is provided]). Each sublist is a matrix containing the predictive accuracy metrics of the implemented methods.

Nils Ternes, Federico Rotolo, and Stefan Michiels
Maintainer: Nils Ternes nils.ternes@yahoo.com

Ternes N, Rotolo F and Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Statistics in Medicine 2016;35(15):2561-2573. doi:10.1002/sim.6927
Ternes N, Rotolo F, Heinze G and Michiels S. Identification of biomarker-by-treatment interactions in randomized clinical trials with survival outcomes and high-dimensional spaces. Biometrical journal. In press. doi:10.1002/bimj.201500234
Uno H, Cai T, Pencina MJ, DAgostino RB and Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 2011;30:1105-1117. doi:10.1002/sim.4154

########################################
# Simulated data set
########################################

## Low calculation time
  set.seed(654321)
  sdata <- simdata(
    n = 500, p = 20, q.main = 3, q.inter = 0,
    prob.tt = 0.5, alpha.tt = 0,
    beta.main = -0.8,
    b.corr = 0.6, b.corr.by = 4,
    m0 = 5, wei.shape = 1, recr = 4, fu = 2,
    timefactor = 1)
  
  newdata <- simdataV(
    traindata = sdata,
    Nvalid = 500
  )
   
  resBM <- BMsel(
    data = sdata, 
    method = c("lasso", "lasso-pcvl"), 
    inter = FALSE, 
    folds = 5)
  
  predAcc <- predRes(
    res = resBM,
    traindata = sdata,
    newdata = newdata,
    time = 1:5)
    
  plot(predAcc, crit = "C")

## Not run: 
## Moderate calculation time
  set.seed(123456)
  sdata <- simdata(
    n = 500, p = 100, q.main = 5, q.inter = 5,
    prob.tt = 0.5, alpha.tt = -0.5,
    beta.main = c(-0.5, -0.2), beta.inter = c(-0.7, -0.4),
    b.corr = 0.6, b.corr.by = 10,
    m0 = 5, wei.shape = 1, recr = 4, fu = 2,
    timefactor = 1,
    active.inter = c("bm003", "bm021", "bm044", "bm049", "bm097"))

  resBM <- BMsel(
    data = sdata, 
    method = c("lasso", "lasso-pcvl"), 
    inter = TRUE, 
    folds = 5)
  
  predAcc <- predRes(
    res = resBM,
    traindata = sdata, 
    int.cv = TRUE, 
    time = 1:5, 
    ncores = 5)
  plot(predAcc, crit = "dC")

## End(Not run)

########################################
# Breast cancer data set
########################################

## Not run: 
  data(Breast)
  dim(Breast)
  
  set.seed(123456)
  resBM <-  BMsel(
    data = Breast,
    x = 4:ncol(Breast),
    y = 2:1,
    tt = 3,
    inter = FALSE,
    std.x = TRUE,
    folds = 5,
    method = c("lasso", "lasso-pcvl"))

  summary(resBM)

  predAcc <- predRes(
    res = resBM,
    traindata = Breast,
    time = 1:4,
    trace = TRUE)
  plot(predAcc, crit = "C")

## End(Not run)

########################################
########################################