In cardiomoon/multipleROC: Package for ROC Analysis with Models with Multiple Predictors

knitr::opts_chunk$set(
  collapse = TRUE,
  message=FALSE,
  warning = FALSE,
  comment = NA,
  fig.height=7,
  fig.width=7,
  dpi=150
)

R package multipleROC is for ROC analysis with models with multiple predictors. You can draw a ROC plot with ggplot2 for models with multiple predictors. You can draw several ROC curves in a plot. You can perform automatic stepwise logistic regression analysis and compare the initial and final models.

Installation

You can install multipleROC package from github.

install.packages("remotes")
remotes::install_github("cardiomoon/multipleROC")

Package loading

library(multipleROC)

Can I differentiate male patients from female patients with their height?

Data radial from package moonBook is a dataset containing demographic data and laboratory data of 115 patients. If you want to differentiate male and female patients with their heights(in centimeter), you may draw exploratory plot first.

library(ggplot2)
library(dplyr)
library(tidyr)

theme_set(theme_bw())
data(radial,package="moonBook")

ggplot(data=radial)+geom_density(aes(x=height,fill=sex),alpha=0.5) 
ggplot(data=radial)+geom_boxplot(aes(x=sex,y=height,fill=sex),alpha=0.5)

As you can see, male patients are taller than female patients. But how can I determine the optimal cutoff value?

Make the first ROC plot

To draw a ROC curve is one of the useful method to determine the optimal cutoff value. You can perform ROC analysis using the following R command.The following R command makes an object of class multipleROC and makes a plot.

data(radial,package="moonBook")
x=multipleROC(male~height,data=radial)

Sensitivity, Specificity, PPV and NPV

You can see the ROC plot with sensitivity(Sens), specificity(Spec), positive predictive value(PPV) and negative predictive value(NPV) with the cutoff value 161. These rates comes from the following table.

table(radial$height>=161,radial$male)

If you use the height $\geq$ 161cm to differentiate male patients from female patients, the true positive(TP) is 50(out of 114), true negative(TN) is 49, false positive(FP) is 7 and the false negative(FN) is 8 patients. You can calculate sensitivity, specificity, positive predictive value and negative predictive values by manual. Sensitivity measures how often a test correctly generates a positive result for people who have the condition that’s being tested for (also known as the “true positive” rate).

\begin{equation} Sensitivity=\frac{TP}{Male\, Patients}=\frac{TP}{FN+TP}=\frac{50}{8+50}=86.2 \% \end{equation}

Specificity measures a test’s ability to correctly generate a negative result for people who don’t have the condition that’s being tested for (also known as the “true negative” rate).

\begin{equation} Specificity=\frac{TN}{Female\, Patients}=\frac{TN}{TN+FP}=\frac{49}{49+7}=87.5 \% \end{equation}

Positive predictive value(PPV) is calculated as follows:

\begin{equation} PPV=\frac{TP}{Pts\,with\, height \geq 161cm}=\frac{TP}{TP+FP}=\frac{50}{50+7}=87.7\% \end{equation}

Negative predictive value(PPV) is calculated as follows:

\begin{equation} NPV=\frac{TN}{Pts\,with\, height < 161cm}=\frac{TN}{TN+FN}=\frac{49}{49+8}= 85.9 \% \end{equation}

The optimal cutoff value

You can calculate the sensitivity, specificity for all the height data of radial. The radial data are from 115 patients, but 1 patient's height is missed. But the length of unique values of radial$height is 39 due to duplication.

length(unique(radial$height))

You can calculate all the ratio by hand. For your convenience, you can use calSens() function included in multipleROC package.

calSens=function(x,y){
     newx=sort(c(unique(x),max(x,na.rm=TRUE)+1))
     completeTable=function(res){
          if(nrow(res)==1){
               res1=matrix(c(0,0),nrow=1)
               temp=setdiff(c("TRUE","FALSE"),attr(res,"dimnames")[[1]][1])
               if(temp=="FALSE") res=rbind(res1,res)
               else res=rbind(res,res1)
               res
          }
          res
     }

     getSens=function(cut){
          res=table(x>=cut,y)
          res=completeTable(res)
          sens=res[2,2]/sum(res[,2])
          spec=res[1,1]/sum(res[,1])
          ppv=res[2,2]/sum(res[2,])
          npv=res[1,1]/sum(res[1,])
          data.frame(x=cut,sens=sens,spec=spec,fpr=1-spec,ppv=ppv,npv=npv,sum=sens+spec)
     }
     map_dfr(newx,function(cut){getSens(cut)})
}

result=calSens(radial$height,radial$male)
result

The last x value of data.frame result is a arbitary value to ensure the ROC curve start from c(0,0).

You can make exploratory plot of sensitivity and specificity. First, You have to transform the data in long from.

longdf <- result %>% pivot_longer(cols=sens:spec,names_to = "rate")
ggplot(data=longdf,aes(x=x,y=value,color=rate))+geom_line()

As you can see, the higher the sensitivity, the lower the specificity. The optimal cutoff value is determined where the sum of specificity and sensitivity is the highest.

result[which.max(result$sum),]

You can see when the height is 161cm, the sum of sensitivity and specificity is the highest. So the best cutoff value of height is 161cm. With the result, you can draw the ROC plot.

result=result[order(result$sens),]
ggplot(data=result,aes(x=fpr,y=sens))+geom_line()

The optimal value obtained by generalized linear model

Look at the first ROC plot again. In the ROC plot, you can see the cutoff value of lr.eta is 0.545. Where does this value come from? This value comes from the generalized linear model.

fit=glm(male~height,data=radial,family=binomial)
fit$fitted.values

You can see the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. You can use this value to make a ROC plot. Again, you can use calSens() function.

result2=calSens(fit$fitted.values,fit$y)
result2[which.max(result2$sum),]

The optimal fitted value is 0.5451. You can draw ROC curve with this value.

result2=result2[order(result2$sens),]
ggplot(data=result2,aes(x=fpr,y=sens))+geom_line()

What is the relationship of the fitted value of 0.5451 and height 161cm ? As you knows, you can interpretate the logistic equation as follows.

\begin{equation} log(\frac{p}{1-p})=\beta_0+\beta_1*X \end{equation}

This equation is expressed with the following R command.

qlogis(x$cutpoint)=coef(x$fit)[1]+coef(x$fit)[2]*X

So you can calculate the optimal cutpoint of height with the optimal fitted value 0.5451 as follows.

height=(qlogis(x$cutpoint)-unname(coef(x$fit)[1]))/unname(coef(x$fit)[2])
height

Area under curve(AUC)

In the lower right corner, you can see the area under curve(AUC) and the p value from Wilcoxon Rank Sum test. The p value comes from:

wilcox.test(radial$height,radial$male)

The AUC is calculated by the simpleAUC function included in multipleROC package.

simpleAUC <- function(df){
     df=df[order(df$x,decreasing=TRUE),]
     TPR=df$sens
     FPR=df$fpr

     dFPR <- c(diff(FPR), 0)
     dTPR <- c(diff(TPR), 0)

     sum(TPR * dFPR) + sum(dTPR * dFPR)/2
}

simpleAUC(x$df)

Convert to pROC::roc object

You can convert a multipleROC object to a roc object. You can use this object to calculate auc or compare the AUC of two ROC curves.

class(x)
multipleROC2roc=function(x){
     pROC::roc(x$fit$y,x$fit$fitted.values,ci=T)
}

x2 <- multipleROC2roc(x)
class(x2)
pROC::auc(x2)

Multiple ROC curves

You can draw multiple ROC curves in the same plot. First, make a list of multipleROC objects and use the plot_ROC function.

a=multipleROC(male~height,data=radial,plot=FALSE)
b=multipleROC(male~age,data=radial,plot=FALSE)
c=multipleROC(form=male~weight,data=radial,plot=FALSE)
plot_ROC(list(a,b,c),show.eta=FALSE,show.sens=FALSE)

You can make facetted plot with the following R command.

plot_ROC(list(a,b,c),facet=TRUE)

By setting the facet argument TRUE, you can get the facetted plot. Alternatively you can use the facet_grid() function of the package ggplot2.

require(ggplot2)
plot_ROC(list(a,b,c))+facet_grid(no~.)

Models with multiple predictors

You can make ROC plots with multiple predictors.

multipleROC(male~height+weight+age,data=radial)

This plot is made from the following glm object.

fit=glm(male~height+weight+age,data=radial,family=binomial)

Automatic stepwise backward elimination and final model selection

You can use automatic stepwise backward elmination by AIC in multiple glm model.

final=step(fit,trace=0)
summary(final)

You can compare the initial and final model.

anova(final,fit,test="Chisq")

You can draw plot comparing two ROC curves of the initial and the final model.

step_ROC(male~weight+height+age,data=radial)

You can get the anova table.

step_ROC(male~weight+height+age,data=radial,plot=FALSE)

cardiomoon/multipleROC documentation built on Sept. 9, 2023, 7:41 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cardiomoon/multipleROC
Package for ROC Analysis with Models with Multiple Predictors

In cardiomoon/multipleROC: Package for ROC Analysis with Models with Multiple Predictors

Installation

Package loading

Can I differentiate male patients from female patients with their height?

Make the first ROC plot

Sensitivity, Specificity, PPV and NPV

The optimal cutoff value

The optimal value obtained by generalized linear model

Area under curve(AUC)

Convert to pROC::roc object

Multiple ROC curves

Models with multiple predictors

Automatic stepwise backward elimination and final model selection

R Package Documentation

Browse R Packages

We want your feedback!

cardiomoon/multipleROC Package for ROC Analysis with Models with Multiple Predictors

In cardiomoon/multipleROC: Package for ROC Analysis with Models with Multiple Predictors

Installation

Package loading

Can I differentiate male patients from female patients with their height?

Make the first ROC plot

Sensitivity, Specificity, PPV and NPV

The optimal cutoff value

The optimal value obtained by generalized linear model

Area under curve(AUC)

Convert to pROC::roc object

Multiple ROC curves

Models with multiple predictors

Automatic stepwise backward elimination and final model selection

R Package Documentation

Browse R Packages

We want your feedback!

cardiomoon/multipleROC
Package for ROC Analysis with Models with Multiple Predictors