help.md
In plotROC: Generate Useful ROC Curve Charts for Print and Interactive Use

Usage

The web app comes pre-loaded with an example dataset. Use this example to familiarize yourself with the functions.

To upload your own data, click on the "upload data" button. Data files must be comma delimited, with the first row containing a header of the column names. This can be done in Excel by saving the file as .csv.

The first step is to specify the outcome variable by choosing one from the drop-down list. The outcome variable must be binary, meaning that it can only have 2 levels, typically 0/1.

The next step is to specify the marker variable. This can and should be continuous. To create multiple curves on a single plot, check the check box that is labeled "Check to plot multiple curves". Then you can select multiple marker variables from the selection box.

Plot customization options are on the lower right. You can add a title, curve labels, adjust the label positioning, and adjust the cutoff number and font size.

Click the download buttons to save your plots to your computer. Downloading the interactive plot will generate html code that can be opened in a web browser. Open this code in a text editor and you can copy it directly into your own web-page. Downloading the print plot will save a pdf version.

We welcome you feedback. If you have difficulties, or suggestions for improvements, send me an email at michael.sachs@nih.gov.

What is an ROC curve?

In medicine and other applications, we often use a biomarker or a test to predict an underlying condition, disease state, or future event. For a binary test and a binary disease state, the following table summarizes the possible errors that one can make using the test as a prediction of a disease state measured by some gold standard

td { border: 1px #aaa solid; padding: .2em; } Condition (as determined by "Gold standard") Total population Condition positive Condition negative Prevalence = Σ Condition positive

Σ Total population

Test outcome Test outcome positive True positive False positive Positive predictive value (PPV) = Σ True positive

Σ Test outcome positive

Test outcome negative False negative True negative Negative predictive value (NPV) = Σ True negative

Σ Test outcome negative

True positive fraction (TPF, Sensitivity) = Σ True positive

Σ Condition positive

False positive fraction (FPF) = Σ False positive

Σ Condition negative

Accuracy (ACC) = Σ True positive + Σ True negative

Σ Total population

False negative fraction (FNF) = Σ False negative

Σ Condition positive

True negative fraction (TNF, Specificity) = Σ True negative

Σ Condition negative

This table was adapted from the ROC curve Wikipedia page

If the test is continuous, say $M$, then a test positive is defined as $M > c$. Now we consider measures of accuracy as functions of $c$, i.e.

$$ TPF(c) = Pr{M > c | D = 1} $$ $$ FPF(c) = Pr{M > c | D = 0} $$

The ROC curve is a plot of $FPF(c)$ versus $TPF(c)$. It's purpose is to allow the viewer to assess the accuracy of the test $M$ for any possible value of the cutoff $c$. This aids in deciding what cutoff to use in practice, comparing different tests for the same thing, and for evaluating the overall accuracy. A key advantage of our approach is that the values of the cutoffs are visible!