rocit: ROC Analysis of Binary Classifier

View source: R/rocit.R

rocitR Documentation

ROC Analysis of Binary Classifier

Description

rocit is the main function of ROCit package. With the diagnostic score and the class of each observation, it calculates true positive rate (sensitivity) and false positive rate (1-Specificity) at convenient cutoff values to construct ROC curve. The function returns "rocit" object, which can be passed as arguments for other S3 methods.

Usage

rocit(score, class, negref = NULL, method = "empirical", step = FALSE)

Arguments

score

An numeric array of diagnostic score.

class

An array of equal length of score, containing the class of the observations.

negref

The reference value, same as the reference in convertclass. Depending on the class of x, it can be numeric or character type. If specified, this value is converted to 0 and other is converted to 1. If NULL, reference is set alphabetically.

method

The method of estimating ROC curve. Currently supports "empirical", "binormal" and "nonparametric". Pattern matching allowed thorough grep.

step

Logical, default in FALSE. Only applicable for empirical method and ignored for others. Indicates whether only horizontal and vertical steps should be used to produce the ROC curve. See "Details".

Details

ROC curve is defined as the set of ordered pairs, (FPR(c), TPR(c)), where, -\infty < c < \infty, where, FPR(c) = P(D \ge c | Y = 0) and FPR(c) = P(D \ge c | Y = 1) at cutoff c. Alternately, it can be defined as:

y(x) = 1 - G[F^{-1}(1-x)], 0 \le x \le 1

where F and G are the cumulative density functions of the diagnostic score in negative and positive responses respectively. rocit evaluates TPR and FPR values at convenient cutoffs.

As the name implies, empirical TPR and FPR values are evaluated for method = "empirical". For "binormal", the distribution of diagnostic are assumed to be normal and maximum likelihood parameters are estimated. If method = "nonparametric", then kernel density estimates (using density) are applied with following bandwidth:

  • h_Y = 0.9 * min(\sigma_Y, IQR(D_Y)/1.34)/((n_Y)^{(1/5)})

  • h_{\bar{Y}} = 0.9 * min(\sigma_{\bar{Y}}, IQR(D_{\bar{Y}})/1.34)/((n_{\bar{Y}})^{(1/5)})

as described in Zou et al. From the kernel estimates of PDFs, CDFs are estimated using trapezoidal rule.

For "empirical" ROC, the algorithm firt rank orders the data and calculates TPR and FPR by treating all predicted up to certain level as positive. If step is TRUE, then the ROC curve is generated based on all the calculated {FPR, TPR} pairs regardless of tie in the data. If step is FALSE, then the ROC curve follows a diagonal path for the ties.

For "empirical" ROC, trapezoidal rule is applied to estimate area under curve (AUC). For "binormal", AUC is estimated by \Phi(A/\sqrt(1 + B^2), where A and B are functions of mean and variance of the diagnostic in two groups. For "nonparametric", AUC is estimated as by

\frac{1}{n_Yn_{\bar{Y}}} \sum_{i=1}^{n_{\bar{Y}}} \sum_{j=1}^{n_{Y}} \Phi( \frac{D_{Y_j}-D_{{\bar{Y}}_i}}{\sqrt{h_Y^2+h_{\bar{Y}}^2}} )

Value

A list of class "rocit", having following elements:

method

The method applied to estimate ROC curve.

pos_count

Number of positive responses.

neg_count

Number of negative responses.

pos_D

Array of diagnostic scores in positive responses.

neg_D

Array of diagnostic scores in negative responses.

AUC

Area under curve. See "Details".

Cutoff

Array of cutoff values at which the true positive rates and false positive rates are evaluated. Applicable for "empirical" and "nonparametric".

param

Maximum likelihood estimates of \mu and \sigma of the diagnostic score in two groups. Applicable for "binormal".

TPR

Array of true positive rates (or sensitivities or recalls), evaluated at the cutoff values.

FPR

Array of false positive rates (or 1-specificity), evaluated at the cutoff values.

Note

The algorithm is designed for complete cases. If NA(s) found in either score or class, then removed.

References

Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.

Zou, Kelly H., W. J. Hall, and David E. Shapiro. "Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests." Statistics in medicine 16, no. 19 (1997): 2143-2156.

See Also

ciROC, ciAUC, plot.rocit, gainstable, ksplot

Examples

# ---------------------
data("Diabetes")
roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
                       negref = "-") # default method empirical
roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
                     negref = "-", method = "bin")

# ---------------------
summary(roc_empirical)
summary(roc_binormal)

# ---------------------
plot(roc_empirical)
plot(roc_binormal, col = c("#00BA37", "#F8766D"),
       legend = FALSE, YIndex = FALSE)



ROCit documentation built on May 29, 2024, 2:15 a.m.