# rocit: ROC Analysis of Binary Classifier In ROCit: Performance Assessment of Binary Classifier with Visualization

## Description

rocit is the main function of ROCit package. With the diagnostic score and the class of each observation, it calculates true positive rate (sensitivity) and false positive rate (1-Specificity) at convenient cutoff values to construct ROC curve. The function returns "rocit" object, which can be passed as arguments for other S3 methods.

## Usage

 1 rocit(score, class, negref = NULL, method = "empirical", step = FALSE) 

## Arguments

 score An numeric array of diagnostic score. class An array of equal length of score, containing the class of the observations. negref The reference value, same as the reference in convertclass. Depending on the class of x, it can be numeric or character type. If specified, this value is converted to 0 and other is converted to 1. If NULL, reference is set alphabetically. method The method of estimating ROC curve. Currently supports "empirical", "binormal" and "nonparametric". Pattern matching allowed thorough grep. step Logical, default in FALSE. Only applicable for empirical method and ignored for others. Indicates whether only horizontal and vertical steps should be used to produce the ROC curve. See "Details".

## Details

ROC curve is defined as the set of ordered pairs, (FPR(c), TPR(c)), where, -∞ < c < ∞, where, FPR(c) = P(D ≥ c | Y = 0) and FPR(c) = P(D ≥ c | Y = 1) at cutoff c. Alternately, it can be defined as:

y(x) = 1 - G[F^{-1}(1-x)], 0 ≤ x ≤ 1

where F and G are the cumulative density functions of the diagnostic score in negative and positive responses respectively. rocit evaluates TPR and FPR values at convenient cutoffs.

As the name implies, empirical TPR and FPR values are evaluated for method = "empirical". For "binormal", the distribution of diagnostic are assumed to be normal and maximum likelihood parameters are estimated. If method = "nonparametric", then kernel density estimates (using density) are applied with following bandwidth:

• h_Y = 0.9 * min(σ_Y, IQR(D_Y)/1.34)/((n_Y)^{(1/5)})

• h_{\bar{Y}} = 0.9 * min(σ_{\bar{Y}}, IQR(D_{\bar{Y}})/1.34)/((n_{\bar{Y}})^{(1/5)})

as described in Zou et al. From the kernel estimates of PDFs, CDFs are estimated using trapezoidal rule.

For "empirical" ROC, the algorithm firt rank orders the data and calculates TPR and FPR by treating all predicted up to certain level as positive. If step is TRUE, then the ROC curve is generated based on all the calculated {FPR, TPR} pairs regardless of tie in the data. If step is FALSE, then the ROC curve follows a diagonal path for the ties.

For "empirical" ROC, trapezoidal rule is applied to estimate area under curve (AUC). For "binormal", AUC is estimated by Φ(A/√(1 + B^2), where A and B are functions of mean and variance of the diagnostic in two groups. For "nonparametric", AUC is estimated as by

\frac{1}{n_Yn_{\bar{Y}}} ∑_{i=1}^{n_{\bar{Y}}} ∑_{j=1}^{n_{Y}} Φ( \frac{D_{Y_j}-D_{{\bar{Y}}_i}}{√{h_Y^2+h_{\bar{Y}}^2}} )

## Value

A list of class "rocit", having following elements:

 method The method applied to estimate ROC curve. pos_count Number of positive responses. neg_count Number of negative responses. pos_D Array of diagnostic scores in positive responses. neg_D Array of diagnostic scores in negative responses. AUC Area under curve. See "Details". Cutoff Array of cutoff values at which the true positive rates and false positive rates are evaluated. Applicable for "empirical" and "nonparametric". param Maximum likelihood estimates of μ and σ of the diagnostic score in two groups. Applicable for "binormal". TPR Array of true positive rates (or sensitivities or recalls), evaluated at the cutoff values. FPR Array of false positive rates (or 1-specificity), evaluated at the cutoff values.

## Note

The algorithm is designed for complete cases. If NA(s) found in either score or class, then removed.

## References

Pepe, Margaret Sullivan. The statistical evaluation of medical tests for classification and prediction. Medicine, 2003.

Zou, Kelly H., W. J. Hall, and David E. Shapiro. "Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests." Statistics in medicine 16, no. 19 (1997): 2143-2156.

ciROC, ciAUC, plot.rocit, gainstable, ksplot
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # --------------------- data("Diabetes") roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-") # default method empirical roc_binormal <- rocit(score = Diabetes$chol, class = Diabetes$dtest, negref = "-", method = "bin") # --------------------- summary(roc_empirical) summary(roc_binormal) # --------------------- plot(roc_empirical) plot(roc_binormal, col = c("#00BA37", "#F8766D"), legend = FALSE, YIndex = FALSE)