Miscellaneous-calcAUPRC: Area under the PR curve
In bozenne/MRIaggr: Management, Display, and Processing of Medical Imaging Data

Description Usage Arguments Details Value References Examples

Compute the area under the precision recall curve by numerical integration.

1 2	calcAUPRC(x, y, subdivisions = 10000, performance = NULL, ci = TRUE, alpha = 0.05, method = "Kronrod", reltol = .Machine$double.eps^0.25)

`x`	the biomarker values. numeric vector. REQUIRED.
`y`	the class labels. numeric vector, character vector or logical vector. REQUIRED.
`subdivisions`	the maximum number of subintervals used for the integration. positive integer. Only used if `method="integrate"`.
`performance`	an object of class `performance` can be supplied instead of arguments `x` and `y`.
`ci`	should the confidence interval be computed ? logical.
`alpha`	the type 1 error rate. numeric.
`method`	the integration method used to compute the area under the curve. Any of `"integrate"`, `"Kronrod"`, `"Richardson"` `"Clenshaw"`, `"Simpson"` or `"Romberg"`.
`reltol`	the relative accuracy requested. Positive numeric.

This function requires to have installed the ROCR package to work.
The numeric integration of the precision over the recall values can be performed either using the integrate function of the stats package (if method="integrate") or using the integral function of the pracma package. In the latter case, the method argument is used to define the integration procedure (see the documentation of integral for more details).
The confidence interval is computed using the first order delta method and the logistic transformation :

IC(AUPRC) = ≤ft[ \frac{e^{μ_η - 1.96 τ}}{1+ e^{μ_η - 1.96 τ}} \; ; \; \frac{e^{μ_η + 1.96 τ}}{1+ e^{μ_η + 1.96 τ}} \right]

μ_η = logit(\widehat{AUPRC})

τ = \frac{1}{√{n*\widehat{AUPRC}*(1-\widehat{AUPRC})}}

See section 3.2 of (Boyd, 2013) for more details.

ARGUMENTS:
y must have exactly two levels.

If performance is set to NULL, the codex and y will be used to form the performance object.

If ci=FALSE a numeric between 0 and 1.
If ci=TRUE a numeric vector of length 3 containing the punctual estimation, the lower and the upper bound of the confidence interval.

Kendrick Boyd1, Kevin H. Eng, and C. David Page. Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals. Machine Learning and Knowledge Discovery in Databases, 2013:451-466.

data(MRIaggr.Pat1_red, package = "MRIaggr")

## select parameter and binary outcome
cartoT2 <- selectContrast(MRIaggr.Pat1_red, param = "T2_FLAIR_t2", format = "vector")
cartoMASK <- selectContrast(MRIaggr.Pat1_red, param = "MASK_T2_FLAIR_t2", format = "vector")

## compute AUPRC
T2.AUPRC <- calcAUPRC(x = cartoT2, y = cartoMASK)

## compute AUC
## Not run: 
if(require(pROC)){
T2.AUC <- auc(roc(cartoMASK ~ cartoT2))
} 


## display
multiplot(MRIaggr.Pat1_red,param = "T2_FLAIR_t2", num = 1,
          index1 = list(coords = "MASK_T2_FLAIR_t2", outline = TRUE)
)

## End(Not run)

#### 2- with simulated data ####
n0 <- 1000
n1 <- c(10,100,1000)
for(iter_n in 1:length(n1)){
  X <- c(rnorm(n0,0),rnorm(n1[iter_n],2))
  Y <- c(rep(0,n0),rep(1,n1[iter_n]))
  print(calcAUPRC(X,Y))
}

## alternative way using a performance object
perfXY <- ROCR::performance(ROCR::prediction(X,Y), x.measure = "rec", measure = "prec")
calcAUPRC(performance = perfXY, subdivisions = 10000)