roc: Build a ROC curve
In pROC: Display and Analyze ROC Curves

View source: R/roc.R

roc	R Documentation

Build a ROC curve

Description

This is the main function of the pROC package. It builds a ROC curve and returns a “roc” object, a list of class “roc”. This object can be printed, plotted, or passed to the functions auc, ci, smooth.roc and coords. Additionally, two roc objects can be compared with roc.test.

Usage

roc(...)
## S3 method for class 'formula'
roc(formula, data, ...)
## S3 method for class 'data.frame'
roc(data, response, predictor,
ret = c("roc", "coords", "all_coords"), ...)
## Default S3 method:
roc(response, predictor, controls, cases,
density.controls, density.cases,
levels=base::levels(as.factor(response)), percent=FALSE, na.rm=TRUE,
direction=c("auto", "<", ">"), algorithm = 2, quiet = FALSE, 
smooth=FALSE, auc=TRUE, ci=FALSE, plot=FALSE, smooth.method="binormal",
smooth.n=512, ci.method=NULL, density=NULL, ...)
roc_(data, response, predictor, ret = c("roc", "coords", "all_coords"), ...)

Arguments

`response`	a factor, numeric or character vector of responses (true class), typically encoded with 0 (controls) and 1 (cases). Only two classes can be used in a ROC curve. If the vector contains more than two unique values, or if their order could be ambiguous, use `levels` to specify which values must be used as control and case value. If the first argument was a `data.frame`, `response` should be the name of the column in `data` containing the response, quoted for `roc_`, and optionally quoted for `roc.data.frame` (non-standard evaluation or NSE).
`predictor`	a `numeric` or `ordered` vector of the same length than `response`, containing the predicted value of each observation. If the first argument was a `data.frame`, `predictor` should be the name of the column in `data` containing the predictor, quoted for `roc_`, and optionally quoted for `roc.data.frame` (non-standard evaluation or NSE).
`controls`, `cases`	instead of `response`, `predictor`, the data can be supplied as two `numeric` or `ordered` vectors containing the predictor values for control and case observations.
`density.controls`, `density.cases`	a smoothed ROC curve can be built directly from two densities on identical `x` points, as in `smooth`.
`formula`, `data`	a formula of the type `response~predictor`. If mulitple predictors are passed, a named list of `roc` objects will be returned. Additional arguments `data` and `subset`, but not `na.action` are supported, see `model.frame` for more details.
`levels`	the value of the response for controls and cases respectively. By default, the first two values of `levels(as.factor(response))` are taken, and the remaining levels are ignored. It usually captures two-class factor data correctly, but will frequently fail for other data types (response factor with more than 2 levels, or for example if your response is coded “controls” and “cases”, the levels will be inverted) and must then be specified here. If your data is coded as `0` and `1` with `0` being the controls, you can safely omit this argument.
`percent`	if the sensitivities, specificities and AUC must be given in percent (`TRUE`) or in fraction (`FALSE`, default).
`na.rm`	if `TRUE`, the `NA` values will be removed (ignored by `roc.formula`).
`direction`	how are positive observations defined? “<”: observations are positive when they are greater than or equal (`>=`) to the threshold. “>”: observations are positive when they are smaller than or equal (`<=`) to the threshold. “auto” (default): automatically detect in which group the median is higher and take the direction accordingly. See details. You should set this explicity to “>” or “<” whenever you are resampling or randomizing the data, otherwise the curves will be biased towards higher AUC values.
`algorithm`	DEPRECATED. A value other than `2` will produce a warning. The argument will be removed in a future version.
`ret`	for `roc.data.frame` only, whether to return the threshold sensitivity and specificity at all thresholds (“coords”), all the coordinates at all thresholds (“all_coords”) or the `roc` object (“roc”).
`quiet`	set to `TRUE` to turn off `message`s when `direction` and `levels` are auto-detected.
`smooth`	if TRUE, the ROC curve is passed to `smooth` to be smoothed.
`auc`	compute the area under the curve (AUC)? If `TRUE` (default), additional arguments can be passed to `auc`.
`ci`	compute the confidence interval (CI)? If set to `TRUE`, additional arguments can be passed to `ci`.
`plot`	plot the ROC curve? If `TRUE`, additional arguments can be passed to `plot.roc`.
`smooth.method`, `smooth.n`, `ci.method`	in `roc.formula` and `roc.default`, the `method` and `n` arguments to `smooth` (if `smooth=TRUE`) and `of="auc"`) must be passed as `smooth.method`, `smooth.n` and `ci.method` to avoid confusions.
`density`	`density` argument passed to `smooth`.
`...`	further arguments passed to or from other methods, and especially: `auc`: `partial.auc`, `partial.auc.focus`, `partial.auc.correct`. `ci`: `of`, `conf.level`, `boot.n`, `boot.stratified`, `progress` `ci.auc`:, `reuse.auc`, `method` `ci.thresholds`: `thresholds` `ci.se`: `sensitivities` `ci.sp`: `specificities` `plot.roc`: `add`, `col` and most other arguments to the `plot.roc` function. See `plot.roc` directly for more details. `smooth`: `method`, `n`, and all other arguments. See `smooth` for more details.

Details

This function's main job is to build a ROC object. See the “Value” section to this page for more details. Before returning, it will call (in this order) the smooth, auc, ci and plot.roc functions if smooth auc, ci and plot.roc (respectively) arguments are set to TRUE. By default, only auc is called.

Data can be provided as response, predictor, where the predictor is the numeric (or ordered) level of the evaluated signal, and the response encodes the observation class (control or case). The level argument specifies which response level must be taken as controls (first value of level) or cases (second). It can safely be ignored when the response is encoded as 0 and 1, but it will frequently fail otherwise. By default, the first two values of levels(as.factor(response)) are taken, and the remaining levels are ignored. This means that if your response is coded “control” and “case”, the levels will be inverted.

In some cases, it is more convenient to pass the data as controls, cases, but both arguments are ignored if response, predictor was specified to non-NULL values. It is also possible to pass density data with density.controls, density.cases, which will result in a smoothed ROC curve even if smooth=FALSE, but are ignored if response, predictor or controls, cases are provided.

Thresholds are selected as the means between any two consecutive values observed in the data. This choice is aimed to facilitate their interpretation, as any data point will be unambiguously positive or negative regardless of whether the comparison operator includes equality or not. As a corollary, thresholds do not correspond to actual values in the data.

Specifications for auc, ci and plot.roc are not kept if auc, ci or plot are set to FALSE. Especially, in the following case:

    myRoc <- roc(..., auc.polygon=TRUE, grid=TRUE, plot=FALSE)
    plot(myRoc)

the plot will not have the AUC polygon nor the grid. Similarly, when comparing “roc” objects, the following is not possible:

    roc1 <- roc(..., partial.auc=c(1, 0.8), auc=FALSE)
    roc2 <- roc(..., partial.auc=c(1, 0.8), auc=FALSE)
    roc.test(roc1, roc2)

This will produce a test on the full AUC, not the partial AUC. To make a comparison on the partial AUC, you must repeat the specifications when calling roc.test:

    roc.test(roc1, roc2, partial.auc=c(1, 0.8))

Note that if roc was called with auc=TRUE, the latter syntax will not allow redefining the AUC specifications. You must use reuse.auc=FALSE for that.

Value

If the data contained any NA value and na.rm=FALSE, NA is returned. Otherwise, if smooth=FALSE, a list of class “roc” with the following fields:

`auc`	if called with `auc=TRUE`, a numeric of class “auc” as defined in `auc`.
`ci`	if called with `ci=TRUE`, a numeric of class “ci” as defined in `ci`.
`response`	the response vector. Patients whose response is not `%in%` `levels` are discarded. If `NA` values were removed, a `na.action` attribute similar to `na.omit` stores the row numbers.
`predictor`	the predictor vector converted to numeric as used to build the ROC curve. Patients whose response is not `%in%` `levels` are discarded. If `NA` values were removed, a `na.action` attribute similar to `na.omit` stores the row numbers.
`original.predictor`, `original.response`	the response and predictor vectors as passed in argument.
`levels`	the levels of the response as defined in argument.
`controls`	the predictor values for the control observations.
`cases`	the predictor values for the cases.
`percent`	if the sensitivities, specificities and AUC are reported in percent, as defined in argument.
`direction`	the direction of the comparison, as defined in argument.
`sensitivities`	the sensitivities defining the ROC curve.
`specificities`	the specificities defining the ROC curve.
`thresholds`	the thresholds at which the sensitivities and specificities were computed. See below for details.
`call`	how the function was called. See `match.call` for more details.
`fun.sesp`	DEPRECATED. The value will be removed in a future version.

If smooth=TRUE a list of class “smooth.roc” as returned by smooth, with or without additional elements auc and ci (according to the call).

Pipelines

The roc function can be used in pipelines, for instance with dplyr or magrittr. The roc.data.frame method supports both standard and non-standard evaluation (NSE):

library(dplyr)
# Standard evaluation:
aSAH %>%
    filter(gender == "Female") %>%
    roc("outcome", "s100b")
# Non-Standard Evaluation:
aSAH %>%
    filter(gender == "Female") %>%
    roc(outcome, s100b)

For tasks involving programming and variable column names, the roc_ function provides standard evaluation:

# Standard evaluation:
aSAH %>%
    filter(gender == "Female") %>%
    roc_("outcome", "s100b")

By default it returns the roc object, which can then be piped to the coords function to extract coordinates that can be used in further pipelines.

# Returns thresholds, sensitivities and specificities:
aSAH  %>%
    roc(outcome, s100b) %>%
    coords(transpose = FALSE) %>%
    filter(sensitivity > 0.6, 
           specificity > 0.6)

# Returns all existing coordinates, then select precision and recall:
aSAH  %>%
    roc(outcome, s100b) %>%
    coords(ret = "all", transpose = FALSE) %>%
    select(precision, recall)

Errors

If no control or case observation exist for the given levels of response, no ROC curve can be built and an error is triggered with message “No control observation” or “No case observation”.

If the predictor is not a numeric or ordered, as defined by as.numeric or as.ordered, the message “Predictor must be numeric or ordered” is returned.

The message “No valid data provided” is issued when the data wasn't properly passed. Remember you need both response and predictor of the same (not null) length, or both controls and cases. Combinations such as predictor and cases are not valid and will trigger this error.

Infinite values of the predictor cannot always be thresholded by infinity and can cause ROC curves to not reach 0 or 100% specificity or sensitivity. Since version 1.13.0, pROC returns NaN with a warning message “Infinite value(s) in predictor” if predictor contains any infinite values.

References

Tom Fawcett (2006) “An introduction to ROC analysis”. Pattern Recognition Letters 27, 861–874. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.patrec.2005.10.010")}.

Xavier Robin, Natacha Turck, Alexandre Hainard, et al. (2011) “pROC: an open-source package for R and S+ to analyze and compare ROC curves”. BMC Bioinformatics, 7, 77. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/1471-2105-12-77")}.

Examples

data(aSAH)

# Basic example
roc(aSAH$outcome, aSAH$s100b,
    levels=c("Good", "Poor"))
# As levels aSAH$outcome == c("Good", "Poor"),
# this is equivalent to:
roc(aSAH$outcome, aSAH$s100b)
# In some cases, ignoring levels could lead to unexpected results
# Equivalent syntaxes:
roc(outcome ~ s100b, aSAH)
roc(aSAH$outcome ~ aSAH$s100b)
with(aSAH, roc(outcome, s100b))
with(aSAH, roc(outcome ~ s100b))

# With a formula:
roc(outcome ~ s100b, data=aSAH)

## Not run: 
library(dplyr)
aSAH %>%
    filter(gender == "Female") %>%
    roc(outcome, s100b)

## End(Not run)

# Using subset (only with formula)
roc(outcome ~ s100b, data=aSAH, subset=(gender == "Male"))
roc(outcome ~ s100b, data=aSAH, subset=(gender == "Female"))

# With numeric controls/cases
roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
# With ordered controls/cases
roc(controls=aSAH$wfns[aSAH$outcome=="Good"], cases=aSAH$wfns[aSAH$outcome=="Poor"])

# Inverted the levels: "Poor" are now controls and "Good" cases:
roc(aSAH$outcome, aSAH$s100b,
    levels=c("Poor", "Good"))

# The result was exactly the same because of direction="auto".
# The following will give an AUC < 0.5:
roc(aSAH$outcome, aSAH$s100b,
    levels=c("Poor", "Good"), direction="<")

# If we are sure about levels and direction auto-detection,
# we can turn off the messages:
roc(aSAH$outcome, aSAH$s100b, quiet = TRUE)

# If we prefer counting in percent:
roc(aSAH$outcome, aSAH$s100b, percent=TRUE)

# Plot and CI (see plot.roc and ci for more options):
roc(aSAH$outcome, aSAH$s100b,
    percent=TRUE, plot=TRUE, ci=TRUE)

# Smoothed ROC curve
roc(aSAH$outcome, aSAH$s100b, smooth=TRUE)
# this is not identical to
smooth(roc(aSAH$outcome, aSAH$s100b))
# because in the latter case, the returned object contains no AUC

pROC documentation built on Aug. 8, 2025, 6:28 p.m.