In mdbrown/survMarkerTwoPhase: Evaluate the prognostic accuracy of a marker under two phase designs

survMarkerTwoPhase

This R package computes non-parametric and semi-parametric estimates of accuracy measures for risk prediction markers from survival data under two phase study designs, namely the case-cohort (cch) and the nested case-control (ncc) study designs.

The accuracy measures that can be estimated include the AUC, TPR(c), FPR(c), PPV(c), and NPV(c) for for a specific timepoint and fixed marker cutoff value c. Standard errors, along with normal approximated confidence intervals are also calculated. Note: standard errors for non-parametric estimates from a cch design are currently not available.

Below is a brief tutorial to get you started. For more information regarding estimation procedures see the references below.

Tutorial

Download and install the package

#download the package from github
if (!require("devtools")) install.packages("devtools")
devtools::install_github("mdbrown/survMarkerTwoPhase")

library(survMarkerTwoPhase)

#simulated data for illustration
data(SimData)  
head(SimData)

Case-cohort design

For the case-cohort design we sample n=150 from entire cohort, and include all participants with observed failures.

#generate a sub-cohort from SimData
set.seed(12321)
#create a sample index. 1 if sampled, 0 if not
N <- nrow(SimData)
sampleInd <- rep(0, N)

# sample all with observed failure time. (200 individuals)
sampleInd[SimData$status==1] <- 1

#sample 150 more observations from the entire data set without replacement
sampleInd[sample(1:N, 150)] <- 1

table(sampleInd) #total number of subcohort is 293

To estimate accuracy measures for the case-cohort design, sample weights must be calculated. The sample weights w = 1/Pr(Sampled from cohort) for each observation included in the sub-cohort.

cohortData_cch <- SimData

#first calculate the Pr(Sampled from cohort) for each observation
sampleProb <- numeric(500)

#all non-censored observations were sampled, so their sample probability is 1
sampleProb[cohortData_cch$status==1] <- 1 
#all other individuals had a 150/N chance to be sampled 
sampleProb[cohortData_cch$status==0] <- 150/N

#the sample weights are 1/(probability of being sampled)
cohortData_cch$weights <- 1/sampleProb

#indicator of inclusion in the subcohort
cohortData_cch$subcohort = sampleInd  

#marker data is unavailable for those not in the subcohort
cohortData_cch$Y[sampleInd==0] = NA

Here we estimate using non-parametric estimation methods using double inverse probability weighting.

#estimate accuracy measures using non-parametric estimates
#by setting estimation.method = "NP"
survMTP.cch(time =survTime, 
            event = status, 
            marker = Y,
            weights = weights,
            subcoh = subcohort,
            data = cohortData_cch, 
            estimation.method = "NP",  
            predict.time = 2, 
            marker.cutpoint = 0)

Now estimate measures using semi-parametric estimation methods. These methods assume a Cox proportional hazards model.

#estimate accuracy measures using semi-parametric estimates
survMTP.cch(time =survTime, 
            event = status, 
            marker = Y,
            weights = weights,
            subcoh = subcohort,
            data = cohortData_cch, 
            estimation.method = "SP",  
            predict.time = 2, 
            marker.cutpoint = 0)

For more information see ?survMTP.cch.

Nested case-control

For a nested case-control sample, we need to input the full cohort data, with an indicator for inclusion in the subcohort, as well as a dataframe of risk sets, where each row is a case id followed by the corresponding id's that were matched to it.

Generate a nested case-control subcohort using the function ccwc from the Epi package.

require('Epi')

#2 matched controls for each case
nmatch = 2

cohortData_ncc <- SimData
cohortData_ncc$id = 1:dim(cohortData_ncc)[1]

subcohort_ncc <- ccwc(exit = survTime, fail= status, data = cohortData_ncc, controls = nmatch) # match 2 controls for each case. 

#indicator for inclusion in the subcohort
sampleInd = rep(0, nrow(cohortData_ncc)) #initialize all equal to zero
sampleInd[subcohort_ncc$Map] = 1

cohortData_ncc$subcohort = sampleInd  

table(sampleInd) #subcohort sample size of 352

#marker data is unavailable for those not in the subcohort
cohortData_ncc$Y[sampleInd==0] = NA  

#now we need to build the set matrix,
#which will be dimension (# of cases) x (nmatch + 1), so 200x3 here
#each row denotes a selected set, with the case id followed by the matched control ids
Sets <- matrix(nrow = sum(cohortData_ncc$status), ncol = nmatch +1)

for(i in subcohort_ncc$Set){
  Sets[i, ] <- unlist(subset(subcohort_ncc, Set==i, select=Map))
}

Now we are ready to calculate the measures using survMTP.ncc. Standard errors for both SP and NP estimates are calculated using perturbation. NP estimates use kernel smoothing while SP estimates assume a Cox proportional hazards model.

#Nonparametric estimates 
survMTP.ncc(time = survTime, 
         event = status, 
         marker=Y, 
         subcoh=subcohort, 
         id = id, 
         data = cohortData_ncc,
         sets = Sets, 
         estimation.method = "NP", 
         predict.time = 2 , 
         marker.cutpoint = 0)

#Semiparametric estimates 
survMTP.ncc(time = survTime, 
         event = status, 
         marker=Y, 
         subcoh=subcohort, 
         id = id, 
         data = cohortData_ncc,
         sets = Sets, 
         estimation.method = "SP", 
         predict.time = 2 , 
         marker.cutpoint = 0)

See ?survMTP.ncc for further help.

References

Liu D, Cai T, Zheng Y. Evaluating the predictive value of biomarkers with stratified case-cohort design. Biometrics 2012, 4: 1219-1227.
Pepe MS, Zheng Y, Jin Y. Evaluating the ROC performance of markers for future events. Lifetime Data Analysis. 2008, 14: 86-113.
Zheng Y, Cai T, Pepe MS, Levy, W. Time-dependent predictive values of prognostic biomarkers with failure time outcome. JASA 2008, 103: 362-368.
Cai T. and Zheng Y . Resampling Procedures for Making Inference under Nested Case-control Studies. JASA 2013 (in press).
Cai T and Zheng Y/, Evaluating Prognostic Accuracy of Biomarkers under nested case-control studies. Biostatistics* 2012, 13,1, 89-100.

(* equal contributor and corresponding author).

mdbrown/survMarkerTwoPhase documentation built on May 22, 2019, 3:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mdbrown/survMarkerTwoPhase
Evaluate the prognostic accuracy of a marker under two phase designs

In mdbrown/survMarkerTwoPhase: Evaluate the prognostic accuracy of a marker under two phase designs

survMarkerTwoPhase

Tutorial

Download and install the package

Case-cohort design

Nested case-control

References

R Package Documentation

Browse R Packages

We want your feedback!

mdbrown/survMarkerTwoPhase Evaluate the prognostic accuracy of a marker under two phase designs

In mdbrown/survMarkerTwoPhase: Evaluate the prognostic accuracy of a marker under two phase designs

survMarkerTwoPhase

Tutorial

Download and install the package

Case-cohort design

Nested case-control

References

R Package Documentation

Browse R Packages

We want your feedback!

mdbrown/survMarkerTwoPhase
Evaluate the prognostic accuracy of a marker under two phase designs