Miscellaneous-calcAUPRC: Area under the PR curve

Description Usage Arguments Details Value References Examples

Description

Compute the area under the precision recall curve by numerical integration.

Usage

1
2
calcAUPRC(x, y, subdivisions = 10000, performance = NULL, ci = TRUE, alpha = 0.05, 
          method = "Kronrod", reltol = .Machine$double.eps^0.25)

Arguments

x

the biomarker values. numeric vector. REQUIRED.

y

the class labels. numeric vector, character vector or logical vector. REQUIRED.

subdivisions

the maximum number of subintervals used for the integration. positive integer. Only used if method="integrate".

performance

an object of class performance can be supplied instead of arguments x and y.

ci

should the confidence interval be computed ? logical.

alpha

the type 1 error rate. numeric.

method

the integration method used to compute the area under the curve. Any of "integrate", "Kronrod", "Richardson" "Clenshaw", "Simpson" or "Romberg".

reltol

the relative accuracy requested. Positive numeric.

Details

This function requires to have installed the ROCR package to work.
The numeric integration of the precision over the recall values can be performed either using the integrate function of the stats package (if method="integrate") or using the integral function of the pracma package. In the latter case, the method argument is used to define the integration procedure (see the documentation of integral for more details).
The confidence interval is computed using the first order delta method and the logistic transformation :

IC(AUPRC) = ≤ft[ \frac{e^{μ_η - 1.96 τ}}{1+ e^{μ_η - 1.96 τ}} \; ; \; \frac{e^{μ_η + 1.96 τ}}{1+ e^{μ_η + 1.96 τ}} \right]

μ_η = logit(\widehat{AUPRC})

τ = \frac{1}{√{n*\widehat{AUPRC}*(1-\widehat{AUPRC})}}

See section 3.2 of (Boyd, 2013) for more details.

ARGUMENTS:
y must have exactly two levels.

If performance is set to NULL, the codex and y will be used to form the performance object.

Value

If ci=FALSE a numeric between 0 and 1.
If ci=TRUE a numeric vector of length 3 containing the punctual estimation, the lower and the upper bound of the confidence interval.

References

Kendrick Boyd1, Kevin H. Eng, and C. David Page. Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals. Machine Learning and Knowledge Discovery in Databases, 2013:451-466.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
data(MRIaggr.Pat1_red, package = "MRIaggr")

## select parameter and binary outcome
cartoT2 <- selectContrast(MRIaggr.Pat1_red, param = "T2_FLAIR_t2", format = "vector")
cartoMASK <- selectContrast(MRIaggr.Pat1_red, param = "MASK_T2_FLAIR_t2", format = "vector")

## compute AUPRC
T2.AUPRC <- calcAUPRC(x = cartoT2, y = cartoMASK)

## compute AUC
## Not run: 
if(require(pROC)){
T2.AUC <- auc(roc(cartoMASK ~ cartoT2))
} 


## display
multiplot(MRIaggr.Pat1_red,param = "T2_FLAIR_t2", num = 1,
          index1 = list(coords = "MASK_T2_FLAIR_t2", outline = TRUE)
)

## End(Not run)

#### 2- with simulated data ####
n0 <- 1000
n1 <- c(10,100,1000)
for(iter_n in 1:length(n1)){
  X <- c(rnorm(n0,0),rnorm(n1[iter_n],2))
  Y <- c(rep(0,n0),rep(1,n1[iter_n]))
  print(calcAUPRC(X,Y))
}

## alternative way using a performance object
perfXY <- ROCR::performance(ROCR::prediction(X,Y), x.measure = "rec", measure = "prec")
calcAUPRC(performance = perfXY, subdivisions = 10000)

bozenne/MRIaggr documentation built on May 13, 2019, 1:39 a.m.