knitr::opts_chunk$set( echo = TRUE, fig.path = "figures/" )
This is a repository for an R package tidyroc
. tidyroc
is currently under development, and I plan to release it this summer (summer 2019).
You can install tidyroc
with the following command
remotes::install_github("dariyasydykova/tidyroc")
tidyroc
has 3 primary functions:
make_roc()
calculates true positive and false positive rates to plot a receiver operating characteristic (ROC) curvemake_pr()
calculates presicion and recall (true positive rate) to plot a precision-recall curve calc_auc()
calculates the area under an ROC (AUC)tidyroc
is designed to be integrated with the Tidyverse. It is intended to work with broom
, dplyr
, and ggplot2
. Here is a simple use case.
library(tidyverse) library(broom) library(tidyroc) glm(am ~ disp, family = binomial, data = mtcars ) %>% augment() %>% make_roc(predictor = .fitted, known_class = am) %>% ggplot(aes(x = fpr, y = tpr)) + geom_line()
# load tidyverse packages library(tidyverse) library(broom) # load cowplot to change plot theme library(cowplot) # load tidyroc library(tidyroc)
I use the biopsy
dataset from the MASS
package. This dataset contains information about biopsies of breast cancer tumors for 699 patients. I fit two logistic regression models that attempt to predict tumor type (benign or malignant).
# get `biopsy` dataset from `MASS` data(biopsy, package = "MASS") # change column names from `V1`, `V2`, etc. to informative variable names colnames(biopsy) <- c("ID", "clump_thickness", "uniform_cell_size", "uniform_cell_shape", "marg_adhesion", "epithelial_cell_size", "bare_nuclei", "bland_chromatin", "normal_nucleoli", "mitoses", "outcome") # fit a logistic regression model to predict tumor types glm_out1 <- glm( formula = outcome ~ clump_thickness + uniform_cell_shape + marg_adhesion + bare_nuclei + bland_chromatin + normal_nucleoli, family = binomial, data = biopsy ) %>% augment() %>% mutate(model = "m1") # name the model # fit a different logistic regression model to predict tumor types glm_out2 <- glm(outcome ~ clump_thickness, family = binomial, data = biopsy ) %>% augment() %>% mutate(model = "m2") # name the model # combine the two datasets to make an ROC curve for each model glm_out <- bind_rows(glm_out1, glm_out2) # plot the distribution of fitted values to see both models' outcomes glm_out %>% ggplot(aes(x = .fitted, fill = outcome)) + geom_density(alpha = 0.6, color = NA) + scale_fill_manual(values = c("#F08A5D", "#B83B5E")) + facet_wrap(~ model)
Now that I have fitted values, I can make a plot with two ROC curves and a plot with two precision-recall curves. I can also calculate the area under each of the ROC curves and the area under each of the precision-recall curves.
# plot ROC curves glm_out %>% group_by(model) %>% # group to get individual ROC curve for each model make_roc(predictor = .fitted, known_class = outcome) %>% # get values to plot an ROC curve ggplot(aes(x = fpr, y = tpr, color = model)) + geom_line(size = 1.1) + geom_abline(slope = 1, intercept = 0, size = 0.4) + scale_color_manual(values = c("#48466D", "#3D84A8")) + theme_cowplot()
# plot precision-recall curves using the data-frame we generated in the previous example glm_out %>% group_by(model) %>% # group to get individual precision-recall curve for each model make_pr(predictor = .fitted, known_class = outcome) %>% # get values to plot a precision-recall curve ggplot(aes(x = recall, y = precision, color = model)) + geom_line(size = 1.1) + coord_cartesian(ylim = c(0, 1), xlim = c(0, 1)) + scale_color_manual(values = c("#48466D", "#3D84A8")) + theme_cowplot()
glm_out %>% group_by(model) %>% # group to get individual AUC values for each ROC curve make_roc(predictor = .fitted, known_class = outcome) %>% summarise(auc = calc_auc(x = fpr, y = tpr))
glm_out %>% group_by(model) %>% # group to get individual AUC values for each precision-recall curve make_pr(predictor = .fitted, known_class = outcome) %>% summarise(auc = calc_auc(x = recall, y = precision))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.