knitr::opts_chunk$set(
  echo = TRUE,
  fig.path = "figures/"
)

tidyroc

This is a repository for an R package tidyroc. tidyroc is currently under development, and I plan to release it this summer (summer 2019).

Installation

You can install tidyroc with the following command

remotes::install_github("dariyasydykova/tidyroc")

Usage

tidyroc has 3 primary functions:

tidyroc is designed to be integrated with the Tidyverse. It is intended to work with broom, dplyr, and ggplot2. Here is a simple use case.

library(tidyverse)
library(broom)
library(tidyroc)

glm(am ~ disp, 
  family = binomial,
  data = mtcars
) %>%
  augment() %>%
  make_roc(predictor = .fitted, known_class = am) %>%
  ggplot(aes(x = fpr, y = tpr)) + 
  geom_line()

Example with two logistic regression models

# load tidyverse packages
library(tidyverse)
library(broom)

# load cowplot to change plot theme
library(cowplot)

# load tidyroc
library(tidyroc)

I use the biopsy dataset from the MASS package. This dataset contains information about biopsies of breast cancer tumors for 699 patients. I fit two logistic regression models that attempt to predict tumor type (benign or malignant).

# get `biopsy` dataset from `MASS`
data(biopsy, package = "MASS")

# change column names from `V1`, `V2`, etc. to informative variable names
colnames(biopsy) <- 
    c("ID",
      "clump_thickness",
      "uniform_cell_size",
      "uniform_cell_shape",
      "marg_adhesion",
      "epithelial_cell_size",
      "bare_nuclei",
      "bland_chromatin",
      "normal_nucleoli",
      "mitoses",
      "outcome")

# fit a logistic regression model to predict tumor types
glm_out1 <- glm(
  formula = outcome ~ clump_thickness + uniform_cell_shape + marg_adhesion + bare_nuclei + bland_chromatin + normal_nucleoli,
  family = binomial,
  data = biopsy
) %>%
  augment() %>%
  mutate(model = "m1") # name the model

# fit a different logistic regression model to predict tumor types
glm_out2 <- glm(outcome ~ clump_thickness,
  family = binomial,
  data = biopsy
) %>%
  augment() %>%
  mutate(model = "m2") # name the model

# combine the two datasets to make an ROC curve for each model
glm_out <- bind_rows(glm_out1, glm_out2)

# plot the distribution of fitted values to see both models' outcomes
glm_out %>%
  ggplot(aes(x = .fitted, fill = outcome)) +
  geom_density(alpha = 0.6, color = NA) +
  scale_fill_manual(values = c("#F08A5D", "#B83B5E")) +
  facet_wrap(~ model)

Now that I have fitted values, I can make a plot with two ROC curves and a plot with two precision-recall curves. I can also calculate the area under each of the ROC curves and the area under each of the precision-recall curves.

ROC curve

# plot ROC curves
glm_out %>%
  group_by(model) %>% # group to get individual ROC curve for each model
  make_roc(predictor = .fitted, known_class = outcome) %>% # get values to plot an ROC curve
  ggplot(aes(x = fpr, y = tpr, color = model)) +
  geom_line(size = 1.1) +
  geom_abline(slope = 1, intercept = 0, size = 0.4) +
  scale_color_manual(values = c("#48466D", "#3D84A8")) +
  theme_cowplot()

Precision-recall curve

# plot precision-recall curves using the data-frame we generated in the previous example
glm_out %>%
  group_by(model) %>% # group to get individual precision-recall curve for each model
  make_pr(predictor = .fitted, known_class = outcome) %>% # get values to plot a precision-recall curve
  ggplot(aes(x = recall, y = precision, color = model)) +
  geom_line(size = 1.1) +
  coord_cartesian(ylim = c(0, 1), xlim = c(0, 1)) +
  scale_color_manual(values = c("#48466D", "#3D84A8")) +
  theme_cowplot()

AUC values

glm_out %>%
  group_by(model) %>% # group to get individual AUC values for each ROC curve
  make_roc(predictor = .fitted, known_class = outcome) %>%
  summarise(auc = calc_auc(x = fpr, y = tpr))
glm_out %>%
  group_by(model) %>% # group to get individual AUC values for each precision-recall curve
  make_pr(predictor = .fitted, known_class = outcome) %>%
  summarise(auc = calc_auc(x = recall, y = precision))


dariyasydykova/tidyroc documentation built on May 14, 2019, 11:03 p.m.