rocit: ROC Analysis of Binary Classifier
In ROCit: Performance Assessment of Binary Classifier with Visualization

rocit

R Documentation

ROC Analysis of Binary Classifier

Description

rocit is the main function of ROCit package. With the diagnostic score and the class of each observation, it calculates true positive rate (sensitivity) and false positive rate (1-Specificity) at convenient cutoff values to construct ROC curve. The function returns "rocit" object, which can be passed as arguments for other S3 methods.

Usage

rocit(score, class, negref = NULL, method = "empirical", step = FALSE)

Arguments

`score`	An numeric array of diagnostic score.
`class`	An array of equal length of score, containing the class of the observations.
`negref`	The reference value, same as the `reference` in `convertclass`. Depending on the class of `x`, it can be numeric or character type. If specified, this value is converted to 0 and other is converted to 1. If NULL, reference is set alphabetically.
`method`	The method of estimating ROC curve. Currently supports `"empirical"`, `"binormal"` and `"nonparametric"`. Pattern matching allowed thorough `grep`.
`step`	Logical, default in `FALSE`. Only applicable for `empirical` method and ignored for others. Indicates whether only horizontal and vertical steps should be used to produce the ROC curve. See "Details".

Details

ROC curve is defined as the set of ordered pairs, (FPR(c), TPR(c)), where, -\infty < c < \infty, where, FPR(c) = P(D \ge c | Y = 0) and FPR(c) = P(D \ge c | Y = 1) at cutoff c. Alternately, it can be defined as:

y(x) = 1 - G[F^{-1}(1-x)], 0 \le x \le 1

where F and G are the cumulative density functions of the diagnostic score in negative and positive responses respectively. rocit evaluates TPR and FPR values at convenient cutoffs.

As the name implies, empirical TPR and FPR values are evaluated for method = "empirical". For "binormal", the distribution of diagnostic are assumed to be normal and maximum likelihood parameters are estimated. If method = "nonparametric", then kernel density estimates (using density) are applied with following bandwidth:

h_Y = 0.9 * min(\sigma_Y, IQR(D_Y)/1.34)/((n_Y)^{(1/5)})
h_{\bar{Y}} = 0.9 * min(\sigma_{\bar{Y}}, IQR(D_{\bar{Y}})/1.34)/((n_{\bar{Y}})^{(1/5)})

as described in Zou et al. From the kernel estimates of PDFs, CDFs are estimated using trapezoidal rule.

For "empirical" ROC, the algorithm firt rank orders the data and calculates TPR and FPR by treating all predicted up to certain level as positive. If step is TRUE, then the ROC curve is generated based on all the calculated {FPR, TPR} pairs regardless of tie in the data. If step is FALSE, then the ROC curve follows a diagonal path for the ties.

For "empirical" ROC, trapezoidal rule is applied to estimate area under curve (AUC). For "binormal", AUC is estimated by \Phi(A/\sqrt(1 + B^2), where A and B are functions of mean and variance of the diagnostic in two groups. For "nonparametric", AUC is estimated as by

\frac{1}{n_Yn_{\bar{Y}}} \sum_{i=1}^{n_{\bar{Y}}} \sum_{j=1}^{n_{Y}} \Phi( \frac{D_{Y_j}-D_{{\bar{Y}}_i}}{\sqrt{h_Y^2+h_{\bar{Y}}^2}} )

Value

A list of class "rocit", having following elements:

`method`	The method applied to estimate ROC curve.
`pos_count`	Number of positive responses.
`neg_count`	Number of negative responses.
`pos_D`	Array of diagnostic scores in positive responses.
`neg_D`	Array of diagnostic scores in negative responses.
`AUC`	Area under curve. See "Details".
`Cutoff`	Array of cutoff values at which the true positive rates and false positive rates are evaluated. Applicable for `"empirical"` and `"nonparametric"`.
`param`	Maximum likelihood estimates of `\mu` and `\sigma` of the diagnostic score in two groups. Applicable for `"binormal"`.
`TPR`	Array of true positive rates (or sensitivities or recalls), evaluated at the cutoff values.
`FPR`	Array of false positive rates (or 1-specificity), evaluated at the cutoff values.

Note

The algorithm is designed for complete cases. If NA(s) found in either score or class, then removed.

References

Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.

Zou, Kelly H., W. J. Hall, and David E. Shapiro. "Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests." Statistics in medicine 16, no. 19 (1997): 2143-2156.

Examples

# ---------------------
data("Diabetes")
roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
                       negref = "-") # default method empirical
roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
                     negref = "-", method = "bin")

# ---------------------
summary(roc_empirical)
summary(roc_binormal)

# ---------------------
plot(roc_empirical)
plot(roc_binormal, col = c("#00BA37", "#F8766D"),
       legend = FALSE, YIndex = FALSE)

ROCit documentation built on May 29, 2024, 2:15 a.m.